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I I Abstract. We investigate some basic questions about the interaction of regular and ra- 

l_J tional relations on words. The primary motivation comes from the study of logics for 

[L querying graph topology, which have recently found numerous applications. Such logics 

• use conditions on paths expressed by regular languages and relations, but they often need 

fi to be extended by rational relations such as subword or subsequence. Evaluating formulae 

I ' in such extended graph logics boils down to checking nonemptiness of the intersection of 

rational relations with regular or recognizable relations (or, more generally, to the gener- 
alized intersection problem, asking whether some projections of a regular relation have a 
nonempty intersection with a given rational relation). 

We prove that for several basic and commonly used rational relations, the intersec- 
tion problem with regular relations is either undecidable (e.g., for subword or suffix, and 
some generalizations), or decidable with non-primitive-recursive complexity (e.g., for sub- 
sequence and its generalizations). These results are used to rule out many classes of graph 
logics that freely combine regular and rational relations, as well as to provide the sim- 
7^f\ plest problem related to verifying lossy channel systems that has non-primitive-recursive 

I complexity. We then prove a dichotomy result for logics combining regular conditions on 

• • individual paths and rational relations on paths, by showing that the syntactic form of 

_ ]^ formulae classifies them into either efficiently checkable or undecidable cases. We also give 

K^ examples of rational relations for which such logics are decidable even without syntactic 

t , restrictions. 
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1. Introduction 



The motivation for the problems investigated in this paper comes from the study of 
logics for querying graphs. Such logics form the basis of query languages for graph databases, 
that have recently found numerous applications in areas including biological networks, social 
networks, Semantic Web, crime detection, etc. (see [1] for a survey) and led to multiple 
systems and prototypes. In such applications, data is usually represented as a labeled graph. 
For instance, in social networks, people are nodes, and labeled edges represent different types 
of relationship between them; in RDF ~ the underlying data model of the Semantic Web - 
data is modeled as a graph, with RDF triples naturally representing labeled edges. 

The questions that we address are related to the interaction of various classes of re- 
lations on words, for instance, rational relations (examples of those include subword and 
subsequence) or regular relations (such as prefix, or equality of words). An example of a 
question we are interested in is as follows: is it decidable whether a given regular relation 
contains a pair (w,w') so that ti; is a subword/subsequence of w'? Problems like this are 
very basic and deserve a study on their own, but they are also necessary to answer questions 
on the power and complexity of querying graph databases. We now explain how they arise 
in that setting. 

Logical languages for querying graph data have been developed since the late 1980s 
(and some of them became precursors of languages later used for XML). They query the 
topology of the graph, often leaving querying data that might be stored in the nodes to a 
standard database engine. Such logics are quite different in their nature and applications 
from another class of graph logics based on spatial calculi [11, 18]. Their formulae combine 
various reachability patterns. The simplest form is known as regular path queries (RPQs) 
[17, 16]; they check the existence of a path whose label belongs to a regular language. Those 
are typically used as atoms and then closed under conjunction and existential quantification, 
resulting in the class of conjunctive regular path queries (^CRPQsJ, which have been the 
subject of much investigation [9, 19, 22]. For instance, a CRPQ may ask for a node v such 
that there exist nodes vi and V2 and paths from v to Vi with the label in a regular language 
Lj, for i = 1, 2. 

The expressiveness of these queries, however, became insufficient in applications such 
as the Semantic Web or biological networks due to their inability to compare paths. For 
instance, it is a common requirement in RDF languages to compare paths based on specific 
semantic associations [2]; biological sequences often need to be compared for similarity, 
based, for example, on the edit distance. 

To address this, an extension of CRPQs with relations on paths was proposed [4]. 
It used re^it/ar relations on paths, i.e., relations given by synchronized automata [21, 23]. 
Equivalently, these are the relations definable in automatic structures on words [5, 7, 8]. 
They include prefix, equality, equal length of words, or fixed edit distance between words. 
The extension of CRPQs with them, called ECRPQs, was shown to have acceptable com- 
plexity (NLogSpace with respect to data, PSpace with respect to query). 

However, the expressive power of ECRPQs is still short of the expressiveness needed in 
many applications. For instance, semantic associations between paths used in RDF applica- 
tions often deal with subwords or subsequences, but these relations are not regular. They are 
rational: they are still accepted by automata, but those whose heads move asynchronously. 
Adding them to a query language must be done with extreme care: simply replacing regular 
relations with rational in the definition of ECRPQs makes query evaluation undecidable! 
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So we set out to investigate the following problem: given a class of graph queries, e.g., 
CRPQs or ECRPQs, what happens if one adds the ability to test whether pairs of paths 
belong to a rational relation S, such as subword or subsequence? We start by observing 
that this problem is a generalization of the intersection problem: given a regular relation R, 
and a rational relation S", is -Rn 5 7^ 0? It is well known that there exist rational relations S 
for which it is undecidable [6]; however, we are not interested in artificial relations obtained 
by encoding PCP instances, but rather in very concrete relations used in querying graph 
data. 

The intersection problem captures the essence of graph logics ECRPQs and CRPQs 
(for the latter, when restricted to the class of recognizable relations [6, 15]). In fact, query 
evaluation can be cast as the generalized intersection problem. Its input includes an m-ary 
regular relation R, a binary rational relation S, and a set / of pairs from {!,... ,m}. It 
asks whether there is a tuple {wi, . . . ,Wm) £ R so that {wi,Wj) £ S whenever {i,j) £ I. 
For m = 2 and / = {(1,2)}, this is the usual intersection problem. 

Another motivation for looking at these basic problems comes from verification of lossy 
channel systems (finite-state processes that communicate over unbounded, but lossy, FIFO 
channels). Their reachability problem is known to be decidable, although the complexity is 
not bounded by any multiply-recursive function [14]. In fact, a "canonical" problem used 
in reductions showing this enormous complexity [13, 14] can be restated as follows: given a 
binary rational relation R, does it have a pair (w, w') so that w is a subsequence of w'7 This 
naturally leads to the question whether the same bounds hold for the simpler instance of 
the intersection problem when we use regular relations instead of rational ones. We actually 
show that this is true. 

Summary of results. We start by showing that evaluating CRPQs and ECRPQs extended 
with a rational relation S can be cast as the generalized intersection problem for S with 
recognizable and regular relations respectively. Moreover, the complexity of the basic in- 
tersection problem is a lower bound for the complexity of query evaluation. 

We then study the complexity of the intersection problem for fixed relations S. For 
recognizable relations, it is well known to be efficiently decidable for every rational S. For 
regular relations, we show that if S is the subword, or the suffix relation, then the problem is 
undecidable. That is, it is undecidable to check, given a binary regular relation R, whether 
it contains a pair (wj-w') so that ty is a subword of w' , or even a suffix of w' . We also present 
a generalization of this result. 

The analogous problem for the subsequence relation is known to be decidable, and, if 
the input is a rational relation R, then the complexity is non- multiply-recursive [13]. We 
extend this in two ways. First, we show that the lower bound remains true even for regular 
relations R. Second, we extend decidability to the class of all rational relations for which 
one projection is closed under subsequence (the subsequence relation itself is trivially such, 
obtained by closing the first projection of the equality relation). 

In addition to establishing some basic facts about classes of relations on words, these 
results tell us about the infeasibility of adding rational relations to ECRPQs: in fact 
adding subword makes query evaluation undecidable, and while it remains decidable with 
subsequence, the complexity is prohibitively high. 

So we then turn to the generalized intersection problem with recognizable relations, 
corresponding to the evaluation of CRPQs with an extra relation S. We show that the 
shape of the relation / holds the key to decidability. If its underlying undirected graph 
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is acyclic, then the problem is decidable in PSpace for every rational relation S (and for 
a fixed formula the complexity drops to NLogSpace). In the cyclic case, the problem 
is undecidable for some rational relation S. For relations generalizing subsequence, we 
have decidability when / is a DAG, and for subsequence itself, as well as for suffix, query 
evaluation is decidable regardless of the shape of CRPQs. 

Thus, under the mild syntactic restriction of acyclicity of comparisons with respect to 
rational relations, such relations can be added to the common class CRPQ of graph queries, 
without incurring a high complexity cost. 

Organization. We give basic definitions in Section 2 and define the main problems we study 
in Section 3. Section 4 introduces graph logics and establishes their connection with the 
(generalized) intersection problem. Section 5 studies decidable and undecidable cases of the 
intersection problem. Section 6 looks at the case of recognizable relations and CRPQs and 
establishes decidability results based on the intersection pattern. 

2. Preliminaries 

Let N = {1,2,...}, [i..j] = {i,i + l,...,j} (if i > j, [i..j] = 0), [i] = [l..i]. Given, 
A, i? C N, an increasing function f : A ^ B is one such that f{i) > f{j) whenever i > j. 
If f{i) > f{j) we call it strictly increasing. 

Alphabets, languages, and morphisms. We shall use letters S, F to denote finite alphabets. 
The set of all finite words over an alphabet S is denoted by S* . We write e for the empty 
word, w ■ w' for the concatenation of two words, and \w\ for the length of a word w. Given 
a word w £ S*, tf [i..j] stands for the substring in positions [i.-j], w[i] for wli-.i], and w[i..] 
for i(;[i..|it;|]. Positions in the word start with 1. 
li w = w' • u • w" , then 

• n is a subword of w (also called factor in the literature, written as u ^ w), 

• w' is a, prefix of w (written as w' ^prcf w), and 

• w" is a suffix of w (written as w" ^suff w). 

We say that w' is a subsequence of w (also called subword embedding or scattered subword 
in the literature, written as w' Q w) if w' is obtained by removing some letters (perhaps 
none) from w, i.e., w = ai . . . a„, and w' = ai^ai^ ■ ■ -ai^, where 1 < ii < i2 < ■ ■ . < ik < n. 

If S C F and ty G F*, then by wy, we denote the projection oi w on S. That is, if 
w = oi . . . a„ and a^^, . . . ,aij. are precisely the letters from S, with ii < . . . < i^, then 

Recall that a monoid M = ([/,-, 1) has an associative binary operation • and a neutral 
element 1 satisfying \x = xl = x for all x (we often write xy for x ■ y). The set S* with 
the operation of concatenation and the neutral element e forms a monoid (S*, •, e), the free 
monoid generated by E. A function f : M ^ M' between two monoids is a morphism if it 
sends the neutral element of M to the neutral element of M' , and if f{xy) = f{x)f{y) for 
all x,y £ M. Every morphism / : (S*, •, e) — t- M is uniquely determined by the values /(a), 
for a e S, as f{ai . . . a„) = /(ai) • • • f{an)- A morphism / : (S*, •, e) — )• (F*, -,£) is called 
alphabetic if f{a) G F U {e}, and strictly alphabetic if f{a) £ F for each a £ T,, see [6]. 
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A language L is a subset of T,* , for some finite alphabet E. It is recognizable if there 
is a finite monoid M, a morphism / : (S*, -,£) — )■ M, and a subset Mq of M such that 
L = r\Mo). 

A language L is regular if there exists an NFA (non-deterministic finite automaton) 
A = {Q, S, qo, 5, F) such that L = C{A), the language of words accepted by A. We use the 
standard notation for NFAs, where Q is the set of states, qq is the initial state, F is the set 
of final states, and (5CQxSxQis the transition relation. 

A language is rational if it is denoted by a regular expression; such expressions are built 
from 0, e, and alphabet letters by using operations of concatenation (e • e'), union (e U e'), 
and Kleene star (e*). It is of course a classical result of formal language theory that the 
classes of recognizable, regular, and rational languages coincide. 

Recognizable, regular, and rational relations. While the notions of recognizability, regular- 
ity, and rationality coincide over languages L C S*, they differ over relations over S, i.e., 
subsets of S* X ... X S*. We now define those (see [6, 12, 15, 21, 23, 34]). 

Since (S*, ■,£) is a monoid, the product (S*)" has the structure of a monoid too. We 
can thus define recognizable n-ary relations over S as subsets R C (S*)" so that there exists 
a finite monoid M and a morphism / : (S*)" — )■ M such that R = /"""^(Mq) for some 
Mq C M. The class of n-ary recognizable relations will be denoted by REC„; when n is 
clear or irrelevant, we write just REC. 

It is well-known that a relation R C (S*)" is in REC„ iff it is a finite union of the sets 
of the form Li x . . . x L„, where each Li is a regular language over S, see [6, 21]. 

Next, we define the class of regular relations. Let _L S be a new alphabet letter, and 
let S_L be S U {J-}. Each tuple w = {wi, . . . , Wn) of words from S* can be viewed as a word 
over E" as follows: pad words Wi with _L so that they all are of the same length, and use 
as the A:th symbol of the new word the n-tuple of the kth symbols of the padded words. 
Formally, let i = maxj \wi\. Then wi f^ . . . f^ Wn is a word of length d. whose fcth symbol is 
(ai, . . . , On) G S" such that 

{the fcth letter oi Wi if Iwil > A: 
_L otherwise. 

We shall also write ®w ioi wi ® . . . ® Wn- We define 7rj(iti ® ■ ■ ■ ® Uk) = Ui for all i G [k]. 
A relation R C (S*)" is called a regular n-ary relation over S if there is a finite automaton 
A over S" that accepts {®w \ w £ R}. The class of n-ary regular relations is denoted by 
REG„; as before, we write REG when n is clear or irrelevant. 

Finally, we define rational relations. There are two equivalent ways of doing it. One 
uses regular expressions, which are now built from tuples a G (S U {e})" using the same 
operations of union, concatenation, and Kleene star. Binary relations ^suff) ^) and Q 
are all rational: the expression (Uaes(^'^)) " (Uaes(^''^)) defines ^suff, the expression 
(Uaes (£'«))* • (UaGS («'«))* " (Uaes (£'«))* defines ^, and the expression (UaesC^'O) U 
(a, a)) defines 1^. 

Alternatively, n-ary rational relations can be defined by means of n-tape automata, 
that have n heads for the tapes and one additional control; at every step, based on the 
state and the letters it is reading, the automaton can enter a new state and move some (but 
not necessarily all) tape heads. The classes of n-ary relations so defined are called rational 
n-ary relations; we use the notation RAT„ or just RAT, as before. 



p. BARCELO, D. FIGUEIRA, AND L. LIBKIN 



Relationships between classes of relations. While it is well known that RECi = REGi = 
RATi, we have strict inclusions 

RECfc C REGfc C RATfc 

for every k > 1 (see for example [6]). For instance, ^prcf G REG2 — REC2 and ^suff G 
RAT2 - REG2. 

The classes of recognizable and regular relations are closed under intersection; however 
the class of rational relations is not. In fact, one can find R G REG2 and S G RAT2 so that 
RnS ^ RAT2. However, if i? G REC^ and S G RAT„, then RnS £ RAT„. 

Binary rational relations can be characterized as follows [6, 30]. A relation i? C S* x S* 
is rational iff there is a finite alphabet F, a regular language L C F* and two alphabetic 
morphisms /, 5 : F* — )■ S* such that R = {{f{w),g{w)) \ w G L}. If we require / and g to 
be strictly alphabetic morphisms, we get the class of length-preserving regular relations, i.e., 
R G REG2 so that {wjw') G R implies \w\ = \w'\. Regular binary relations are then finite 
unions of relations of the form {{w • u, w') \ {w, w') £ R, u £ L} and {{w, w' ■ u) \ {w, w') G 
R, u G L}, where R ranges over length-preserving regular relations, and L over regular 
languages. 

Properties of classes of relations. Since relations in REC and REG are given by NFAs, they 
inherit all the closure/decidability properties of regular languages. If i? G RAT, then each 
of its projections is a regular language, and can be effectively constructed (e.g., from the 
description of R as an n-tape automaton). Hence, the nonemptiness problem is decidable 
for rational relations. However, testing nonemptiness of the intersection of two rational 
relations is undecidable [6]. Also, for R,R' G RAT, the following are undecidable: checking 
whether R C R' or R = R' , universality (i? = S* x E*), and checking whether R G REG or 
R€ REC [6, 12, 28]. 

Remark. We defined recognizable, regular, and rational relations over the same alphabet, 
i.e., as subsets of (S*)". Of course it is possible to define them as subsets of Si x ... x S„, 
with the Sj's not necessarily distinct. Technically, there are no differences and all the results 
will continue to hold. Indeed, one can simply consider a new alphabet S as the disjoint 
union of Sj's, and enforce the condition that the ith projection only use the letters from Sj 
(this is possible for all the classes of relations we consider). In fact, in the proofs we shall 
be using both types of relations. 

Well- quasi- orders. A well-quasi-order < C ^ x A is a reflexive and transitive relation such 
that for every infinite sequence (aj)jgN over A there are i < j with Ui < Oj. We will make 
use of the following two lemmas. 

Lemma 2.1 (Higman's Lemma [25]). For every alphabet S, the subsequence relation Q C 
S* X S* is a well quasi-order. 

Lemma 2.2 (Dickson's Lemma [20]). For every well-quasi-order < '^ A x A, the product 
order <^ C ^'^ x A'^ (where (ai, . . . , a^) <^ {a[, . . . , a'^) iff Cj < a'^ for all i G [fc]) is a 
well-quasi-order. 
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3. Generalized intersection problem 



We now formalize the main technical problem we study. Let 7^ be a class of relations 
over S, and S a class of binary relations over S. We use the notation [m] for {!,..., m}. If 
R is an TTi-ary relation, S* is a binary relation, and / C [m]^, we write RClj S for the set of 
tuples {wi, . . . , Wm) in R such that {wi, Wj) G S whenever (i, j) G I. 

The generalized intersection problem {JZ Pi/ 5) = is defined as: 



Problem: 


(7^ n/ 5) = 


Input: 


an m-ary relation R € TZ, 




a relation S* G 5, and / C [m]^ 


Question: 


isRDiS^di? 



If 5 = {S}, we write S instead of {S}. We write GenInt5(7?.) for the class of all 

problems (IZ fl/ 5) = where S is fixed, i.e., the input consists oi R & TZ and /. As 
was explained in the introduction, this problem captures the essence of evaluating queries 
in various graph logics, e.g., CRPQs or ECRPQs extended with rational relations S. The 
classes TZ will typically be REC and REG. 

If m = 2 and / = {(1,2)}, the generalized intersection problem becomes simply the 
intersection problem for the classes TZ and S of binary relations: 



Problem: (7^n5) = 
Input: RgTZ and S eS 

Question: isi?n5'/0? 



The problem (REC D S) = is decidable for every rational relation S, simply by 
constructing RCi S, which is a rational relation, and testing its nonemptiness. However, 

(REG n S*) = could already be undecidable (we shall give one particularly simple example 
later). 

4. Graph logics and the generalized intersection problem 

In this section we show how the (generalized) intersection problems provide us with 
upper and lower bounds on the complexity of evaluating a variety of logical queries over 
graphs. We start by recalling the basic classes of logics used in querying graph data, and 
show that extending them with rational relations allows us to cast the query evaluation 
problem as an instance of the generalized intersection problem. The key observations are 
that: 

• the complexity of GenInt5(REC) and (REC n S*) = provide an upper and a lower 
bound for the complexity of evaluating CRPQ(5') queries; and 

• for ECRPQ(S'), these bounds are provided by the complexity of GenInt5'(REG) 

and of (REG n 5) = 0. 
The standard abstraction of graph databases [1] is finite E-labeled graphs G = {V,E), 
where F is a finite set of nodes, or vertices, and E<^VxT,xVisa set of labeled edges. A 
path p from vq to Vm in G is a sequence of edges {vq, oq, ^i), (vi, 01,^2), • • • , {vm-i, am-i,Vm) 
from E, for some tti > 0. The label of p, denoted by A(p), is the word uq- ■ ■ a^-i G S*. 
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The main building blocks for graph queries are regular path queries, or RPQs [17]; they 

are expressions of the form x — )■ y, where L is a regular language. We normally assume that 
L is represented by a regular expression or an NFA. Given a S-labeled graph G = {V,E), 
the answer to an RPQ above is the set of pairs of nodes {v, v') such that there is a path p 
from V to v' with X{p) £ L. 

Conjunctive RPQs, or CRPQs [9, 10, 16] are the closure of RPQs under conjunction 
and existential quantification. Formally, they are expressions of the form 

m 

V{x) = 3y /\(n, Atz'J (4.1) 

1=1 
where variables Ui,u'-s come from x,y. The semantics naturally extends the semantics of 
RPQs: (p{a) is true in G iff there is a tuple b of nodes such that for every i < m and every 
Vi,v'^ interpreting m and u'^, respectively, we have a path pi between Vi and v[ whose label 
X{pi) is in Li. 

CRPQs can further be extended to compare paths. For that, we need to name path 
variables, and choose a class of allowed relations on paths. The simplest such extension is 
the class of CRPQ(S') queries, where S" is a binary relation over S*. Its formulae are of the 
form 

m 

V{x) = 3y(^/\(u,^-^u',) A /\ S{X^,XJ)) (4.2) 

i=l {i,j)€l 

where / C [m]'^. We use variables XIt ■ ■ ■•Xm to denote paths; these are quantified exis- 
tentially. That is, the semantics of G ^ '■Pio) is that there is a tuple h of nodes and paths 
Pk, for k < m, between Vk and f^ (where, as before, ffc,w^ are elements of a,b interpreting 
Uk,u'j^) such that (\{pi) , X{pj)) G S whenever {i,j) G /. For instance, the query 

^y, V ((a; ^— ^" y) A (x ^—^ y') Ax^ x) 

finds nodes v so that there are two paths starting from v, one ending with an a-edge, whose 
label is a subsequence of the other one, that ends with a 6-edge. 

The input to the query evaluation problem consists of a graph G, a tuple v of nodes, 
and a query <f{x); the question is whether G |= (p{v). This corresponds to the combined 
complexity of query evaluation. In the context of query evaluation, one is often interested 
in data complexity, when the typically small formula ip is fixed, and the input consists of 
the typically large graph (G, v). We now relate it to the complexity of GenInts(REC). 

Lemma 4.1. Fix a CRPQ(S') query ip as in (4.2). Then there is a DLogSpace algorithm 
that, given a graph G and a tuple v of nodes, constructs an m-ary relation R £ REC so that 

7 

the answer to the generalized intersection problem {R Hj S) = $ is 'y^s' iff G ^ fi^)- 

Proof. Given a S-labeled graph G = {V, E) and two nodes v, v' , we write A{G, v, v') for G 
viewed as an NFA with the initial state v and the final state v' (that is, the set of states is 
V, the transition relation is E, and the alphabet is S). The language of such an automaton, 
C{A{G,v,v')), is the set of labels of all paths between v and v'. 
Now consider a CRPQ(S') query ip{x) given by 



rn 

^y yf\{ui^-^ u'i) A /\ s{xi,xj)j: 

i=l (iJW 



GRAPH LOGICS WITH RATIONAL RELATIONS 



as in (4.2). Suppose we are given a graph G as above and a tuple of nodes v, of the same 
length as the length of x. The DLogSpace algorithm works as follows. 

First we enumerate all tuples b of nodes of G of the same length as y; since (p is fixed, 
this can be done in DLogSpace. For each 6, we construct an TTi-ary relation Ri in REC as 
follows. Let rii and n'^ be the interpretations of Ui and u^, when x is interpreted as v and y 
as 6. Then 

m 

R-, = ll{C{A{G,n^,n'^)nLi). 

i=l 

Note that it can be constructed in DLogSpace; indeed each coordinate of R^ is simply 
a product of the automaton A{G,ni,n[) and a fixed automaton defining Li. Next, let 
R = Ufe-^fe- This is constructed in DLogSpace too. Now it follows immediately from the 
construction that i? Pi/ S" 7^ iff for some b, there exist paths pi between ni,n[, for i < m, 
such that {X{pi), A(pj)) G S whenever (/, j) G /, i.e., iff G ^ v(^)- D 

Conversely, the intersection problem for recognizable relations and S can be encoded 
as answering CRPQ(S') queries. 

Lemma 4.2. For any given binary relation S, there is a CRPQ(5') query (p{x,x') and a 
DLogSpace algorithm that, given a relation R G REC2, constructs a graph G and two 
nodes v, v' so that G |= ip{v, v') iff i? n S" / 0. 

Proof. Let R be in REC2. It is given as UILi(-^j ^ -^i); where the LjS and the KiS are 
regular languages over S. These languages are given by their NFAs which we can view as 
S-labeled graphs. Let {Vi, Ei) be the underlying graph of the NFA defining Lj, such that v'q 
is the initial state, and Fi is the set of final states. Likewise, let {Wi,Hi) be the underlying 
graph of the NFA defining Ki, such that Wq is the initial state, and Cj is the set of final 
states. 

We now construct the graph G. Its labeling alphabet is the union of S and {^, $, !}. 
Its set of vertices is the disjoint union of all the V^s, VFjS, as well as two distinguished nodes 
start and end. Its edges include all the edges from EiS and HiS, and the following: 

• ^-labeled edges from start to each initial state, i.e., to each v^ and wf for all i < n. 

• $-labeled edges between the initial states of automata with the same index, i.e., 
edges {vq, $, Wq) for all i < n. 

• !-labeled edges from final states to end, i.e., edges {v, !, end), where v £ Ui<n-^« '-' 

Uj<n ^i- 

We now define a CRPQ(5) query ip{x,y) (omitting path variables for paths that are 
not used in comparisons): 

\ 

A Xl — )■ X2 
3xi,X2,^l,^2 



/ 


X — 7- Xl 


A 


# 

X — 7- X2 


A 


$ 

Xl -^ X2 






A 


Xl -^ Zi 


A 


X2 -^ Z2 


A 


21 -> y 


A 


Z2^y 


\ A 


5(x,xO 







J 

The query says that from start, we have 7^-edges to the initial states Vq and Wq: they 
must have the same index since there is a $-edge between them. From there we have two 
paths, p and p', corresponding to the variables x ^-i^d x'j which are S-labeled, and thus are 
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paths in the automata for Lj and Ki, respectively. From the end nodes of those paths we 
have !-edges to end, so they must be final states; in particular, X{p) £ Li and X{p') G K^. We 
finally require {\{p),\{p')) G S*, i.e., (A(p), A(/9')) G {LixKi)r\S. Hence, if G |= ip{start, end) 
then for some i < n we have two words {w, w') that belong to (Lj x Ki) D S, i.e., ROS ^ 9. 
Conversely, if i? n S* 7^ 0, then (Lj x i^j) n S* 7^ for some i < n, and the witnessing paths 
of the nonemptiness of {Li x Ki) D S will witness the formula (p{start, end) (together with 
initial states of the automata of Lj and Ki and some of their final states) . D 

Combining the lemmas, we obtain: 

Theorem 4.3. Let /C be a complexity class closed under DLogSpace reductions. Then: 

(1) If the problem GenInt5(REC) is in /C, then data complexity of CRPQ(S') queries 
is in /C; and 

(2) If the problem (RECnS') = is hard for /C, then so is data complexity of CRPQ(S') 
queries. 

We now consider extended CRPQs, or ECRPQs, which enhance CRPQs with regular 
relations [4], and prove a similar result for them, with the role of REC now played by REG. 
Formally, ECRPQs are expressions of the form 

m k 

V{x) = 3y^f^{u,^-^n'i) A /\R,{Xj)) (4.3) 

i=i j=i 

where each Rj is a relation from REG, and Xj a tuple from xi) • • • ^Xm of the same arity 
as Rj. The semantics of course extends the semantics of CRPQs: the witnessing paths 
pi, ... , pm should also satisfy the condition that for every atom R{pi^ , . . . , /jj,) in (4.3), the 
tuple (A(/9iJ, . . . , A(/9j,)) is in R. 

Finally, we obtain ECRPQ(S') queries by adding comparisons with respect to a relation 
S £ RAT, getting a class of queries <f{x) of the form 

m k 

3y [/\{u, "^ u'i) A l\ R,{x,) A /\ S{x^,X,)) (4.4) 

i=i i=i (j,i)e/ 

Similarly to the case of CRPQs, we can establish a connection between data complexity 
of ECRPQ(5) queries and the complexity of the generalized intersection problem: 

Theorem 4.4. Let /C be a complexity class closed under DLogSpace reductions. Then: 

(1) If the problem GenInt5(REG) is in /C, then data complexity of ECRPQ(5) queries 
is in /C; and 

(2) If the problem (REGn5) = is hard for /C, then so is data complexity of ECRPQ(S') 
queries. 

Similarly to the proof of Theorem 4.3, the result will be an immediate consequence 
of two lemmas. First, evaluation of ECRPQ(S') queries is reducible to the generalized 
intersection problem for regular relations. 

Lemma 4.5. Fix an ECRPQ(S') query ip as in (4.4). Then there is a DLogSpace algo- 
rithm that, given a graph G and a tuple v of nodes, constructs an ?7i-ary relation R G REG so 
that the answer to the generalized intersection problem (i? fl/ S*) = is 'yes' iff G |= ^{v). 
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Conversely, the intersection problem for regular relations and S can be encoded as 
answering ECRPQ(S') queries. 

Lemma 4.6. For each binary relation S, there is an ECRPQ(S') query ip{x,x') and a 
DLogSpace algorithm that, given a relation R £ REG2, constructs a graph G and two 
nodes v, v' so that G |= ip{v, v') iff {Rr\S)^ 0. 

The proof of Lemma 4.5 is almost the same as the proof of Lemma 4.1: as before, 
we enumerate tuples 6, construct relations R^ and R = IJ5-R5, but this time we take the 
product of this recognizable relation with regular relations mentioned in the query. Since 
the query is fixed, and hence we take a product with a fixed number of fixed automata, 
such a product construction can be done in DLogSpace. The result is now a regular m-ary 
relation. The rest of the proof is exactly the same as in Lemma 4.1. 

We now prove Lemma 4.6. Let R G REG2 be given by an NFA over T,± x T,± whose 
underlying graph is Gr = {Vr, Eji), where Er C Vr x (S_l x T,±) x Vr. Let vq be its initial 
state, and let F be the set of final states. 

We now define the graph G. Its labeling alphabet F is the disjoint union of S_|_ x S_|_, 
the alphabet S itself, and a new symbol #. Its nodes V include all nodes in Vr and two 
extra nodes, Vf and v' . The edges are: 

• all the edges in Er; 

• edges (f , #, Vf) for every v £ F; 

• edges {v',a,v') for every a G S. 

We now define two regular relations over F. The first, Ri, consists of pairs {w,w'), where 
w G (S_L X S_l)* and w' £ S*. Furthermore, w is of the form w' w" for some w" G S*. 
It is straightforward to check that this relation is regular. The second one, i?2) is the same 
except w is of the form w" ® w' . In other words, the first component is wi ® W2-, and the 
second is either wi or W2-, for Ri or R2, respectively. 
Next, we define the ECRPQ(5) ip{x,y): 

/ X:SxxEx ^ # \ 



3xi,yi,X2,y2, 



A xi -> yi A X2 -> y2 



V A Ri{x,Xi) A i?2(x,X2) A S{xi,X2) J 
Note that when this formula is evaluated over G, with x interpreted as vq and y interpreted 
as Vf, the paths xi &iid X2 can have arbitrary labels from S*. Paths x can have arbitrary 
labels over S_l x S_l; however, since they start in vq and must be followed by an #-edge, 
they end in a final state of the automaton for i?, and hence labels of these paths are precisely 
words in S_l x S_l of the form wi®W2, where (tfi, ^2) G R- Now Ri ensures that the label of 
Xi is wi and that the label of X2 is W2- Hence the labels of xi and X2 are precisely the pairs 
of words in i?, and the query asks whether such a pair belongs to S. Hence, G \= ip{vo, Vf) 
iff i? n 5 7^ 0. It is straightforward to check that the construction of G can be carried out 
in DLogSpace. This proves the lemma and the theorem. 

Thus, our next goal is to understand the behaviors of the generalized intersection prob- 
lem for various rational relations S which are of interest in graph logics; those include 
subword, suffix, subsequence. In fact to rule out many undecidable or infeasible cases it is 
often sufficient to analyze the intersection problem. We do this in the next section, and 
then analyze the decidable cases to come up with graph logics that can be extended with 
rational relations. 
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5. The intersection problem: decidable and undecidable cases 

We now study the problem (REG n S") = for binary rational relations S such as 
subword and subsequence, and for classes of relations generalizing them. The input is a 
binary regular relation R over S, given by an NFA over S_l x Ti±. The question is whether 
i? n S* / 0. We also derive results about the complexity of ECRPQ(S') queries. For 
all lower-bound results in this section, we assume that the alphabet contains at least two 
symbols. 

As already mentioned, there exist rational relations 5 such that (REG D S) = is 
undecidable. However, we are interested in relations that are useful in graph querying, and 
that are among the most commonly used rational relations, and for them the status of the 
problem was unknown. 

Note that the problem (REC n S") = is tractable: given R £ REC, the relation RCi S 
is rational, can be efficiently constructed, and checked for nonemptiness. 

5.1. Undecidable cases: sub^vord and relatives. We now show that even for such 
simple relations as subword and suffix, the intersection problem is undecidable. That is, 
given an NFA over S_l x S_l defining a regular relation R, the problem of checking for the 
existence of a pair {w, w') G R with w ^suff w' oy w < w' is undecidable. 

Theorem 5.1. The problems (REG n ^suff) = and (REG n ^) = are undecidable. 

As an immediate consequence of this, we obtain: 

Corollary 5.2. The query evaluation problem for ECRPQ(^suff) and ECRPQ(^) is un- 
decidable. 

Thus, some of the most commonly used rational relations cannot be added to ECRPQs 
without imposing further restrictions. 

We skip the proof of Theorem 5.1 for the time being and concentrate first on how to 
obtain a more general undecidability result out of it. As we will see below, the essence of the 
undecidability result is that relations such as ^suff and ^ can be decomposed in a way that 
one of the components of the decomposition is a graph of a nontrivial strictly alphabetic 
morphism. More precisely, let R ■ R' be the binary relation {{w ■ w',u ■ u') \ {w,u) G 
i? and {w',u') £ R'}. Let Graph(/) be the graph of a function / : S* — )■ S*, i.e., 
{iw,fiw))\wG^*}. 

Proposition 5.3. Let Rq, Ri be binary relations on S such that Rq is recognizable and its 
second projection is S*. Let / be a strictly alphabetic morphism that is not constant (i.e. 
the image of / contains at least two letters). Then, for S = Rq- Graph(/) • Ri, the problem 

(REG n 5) = is undecidable. 

Note that both ^suff and :< are of the required shape: suffix is {{e} x S*) • Graph(id) • 
({e} X {e}), and subword is ({e} x S*) • Graph(id) • ({e} x S*), where id is the identity 
alphabetic morphism. 

Proofs of Theorem 5.1 and Proposition 5.3. We present the proof for the suffix relation 
^suflf- The proofs for the subword relation, and more generally, for the relations contain- 
ing the graph of an alphabetic morphism follow the same idea and will be explained after 
the proof for ^suff- The proof is by encoding nonemptiness for linearly bounded automata 
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(LBA). Recall that an LBA A has a tape alphabet F that contains two distinguished sym- 
bols, Q and /3, which are the left and the right marker. The input word w G (T — {a,/3})* 
is written between them, i.e., the content of the input tape is a ■ w ■ /3. The LBA behaves 
just like a Turing machine, except that when it is reading a or (3, it cannot rewrite them, 
and it cannot move left of a or right of /3. The problem of checking whether the language 
of a given LBA is nonempty is undecidable. 

We encode this as follows. The alphabet S is the disjoint union of the tape alphabet F 
of the LBA A, the set of its states Q, and the designated symbol $ (we assume, of course, 
that these are disjoint). A configuration C of the LBA consists of the tape content oq . . . an, 
where ao = a and a„ = /3, and all the OjS, for < i < n, are letters from F — {a, (3}, the 
state q, and the position i, for < i < n, that the head is pointing to. We encode this as a 
word 

wc = $ao . . .aj-igoj . . .an$ G S* 
of length n + 4. Of course if the head is pointing to a, the configuration is $qaQ . . . a„$. 
Note that if we have a run of the LBA with configurations Co,Ci, . . ., then the lengths of 
all the wCiS are the same. 

Next, note that the relation 

^imm = {{wc,wc") \ C is an immediate successor of C} 

is regular (in fact such a relation is well-known to be regular even for arbitrary Turing 
machines [5, 7, 8]). Since all configurations are of the same length, we obtain that the 
relation 

^A ~ {(^Co^Ci • • • wc^jWc' ■ ■ ■ 'Wc^^) I C'i+i is an immediate successor of Cj for i < m} 

is regular too (since only one configuration in the first projection does not correspond to 
a configuration in the second projection). By taking the product with a regular language 
that ensures that the first symbol from Q in a word is qq, and the last such symbol is from 
F, we have a regular relation 

C'^^i is an immediate successor of Cj for i < m; 
Cq is an initial configuration ; 
Cm is a final configuration 

which can be effectively constructed from the description of the LBA. 

Now assume that Rj[ D ^suff is nonempty. Then, since all encodings of configurations 
are of the same length, it must contain a pair {wcoWCi ■ ■ ■ wc^-, wci ■ ■ ■ wCm) such that Cj+i 
is an immediate successor of Cj for all i < m. Since Cq is an initial configuration and Cm is 
a final configuration, this implies that the LBA has an accepting computation. Conversely, 
if there is an accepting computation with a sequence of configurations Cq, Ci, . . . , Cm of the 
LBA, then the pair {wcqWCi ■ ■ ■ wc^,wci ■ ■ ■ wcm) is both in i?_4 and in the suffix relation. 
Hence, -R^ n ^suff is nonempty iff there is an accepting computation of the LBA, proving 
undecidability. 

The proof for the subword relation is practically the same. We change the definition 
of relation i?_4 so that there is an extra $ symbol inserted between wcq and wci , and two 
extra $ symbols after wcm iii the first projection; in the second projection we insert extra 
two $ symbols before Wq/ and after wq' ■ Note that the relation remains regular: even if 
the components are not fully synchronized, at every point there is a constant delay between 
them (either 2 or 1), and this can be captured by simply encoding one or two alphabet 
symbols into the state. Since in each word there are precisely two places where the subword 



Ra = i [wcoWCi ■ ■ ■ wcrn ,wc[--- wc^ 
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$$$ appears, the subword relation in this case becomes the suffix relation, and the previous 
proof applies. 

The same proof can be applied to deduce Proposition 5.3. Note that we can encode 
letters of alphabet S within the alphabet {0, 1} so that the encodings of each letter of S 
will have the same length, namely [log2(|r| + |Q| + 1)] • Then the same proof as before will 
apply to show undecidability over the alphabet {0, 1}, since the encodings of configurations 
still have the same length. 

Since Rq is regular, it is of the form |J^ Lj x Ki, and by the assumption, |J^ Ki = T,* . 
Thus, the encoding of the initial configuration will belong to one of the KiS, say Kj. We 
then take a fixed word wq G Lj and assume that the second component of the relation 
starts with wo (which can be enforced by the regular relation). Likewise, we take a fixed 
pair {wi,W2) G Ri, and assume that wi is the suffix of the first component of the relation, 
and W2 is the suffix of the second. This too can be enforced by the regular relation. 

Now if we have a non-constant alphabetic morphism /, we have two letters, say a and 
b, so that /(a) / f{b)- We now simply use these letters, with a playing the role of 0, 
and b playing the role of 1 in the first projection of relation R, and f(a),f{b) playing the 
roles of and 1 in the second projection, to encode the run of an LBA as we did before. 
The only difference is that instead of a sequence of $ symbols to specify the positions of 
the encoding we use a (fixed-length) sequence that is different from wo,wi,W2 above, to 
identify its position uniquely. Then the proof we have presented above applies verbatim. Q 



5.2. Decidable cases: subsequence and relatives. We now show that the intersection 
problem is decidable for the subsequence relation Q and, much more generally, for a class 
of relations that do not, like the relations considered in the previous section, have a "rigid" 
part. More precisely, the problem is also decidable for any relation so that its projection 
on the first component is closed under subsequence. However, the complexity bounds are 
extremely high. In fact we show that the complexity of checking whether {RO Q) ^ 0, 
when R ranges over REG2, is not bounded by any multiply-recursive function. This was 
previously known for R ranging over RAT2, and was viewed as the simplest problem with 
non-multiply-recursive complexity [13]. We now push it further and show that this high 
complexity is already achieved with regular relations. 

Some of the ideas for showing this come from a decidable relaxation of the Post Cor- 
respondence Problem (PCP), namely the regular Post Embedding Problem, or PEP'^'^^, in- 
troduced in [13] . An instance of this problem consists of two morphisms a, o"' : S* — t- T* 
and a regular language L C S*; the question is whether there is some w €z L such that 
a{w) Q a'{w) (recall that in the case of the PCP the question is whether a{w) = a'{w) with 
L = S+). We call w a solution to the instance {a,a',L). The PEP'"'^^ problem is known to 
be decidable, and as hard as the reachability problem for lossy channel systems [13] which 
cannot be bounded by any primitive-recursive function — in fact, by any multiply-recursive 
function (a generalization of primitive recursive functions with hyper- Ackermannian com- 
plexity, see [31]). More precisely, it is shown in [32] to be precisely at the level F^c^ of the 
fast-growing hierarchy of recursive functions [29, 31]. 



In this hierarchy — also known as the Extended Grzegorczyk Hierarchy — , the classes of functions Fq, 
are closed under elementary-recursive reductions, and are indexed by ordinals. Ackermannian complexity 
corresponds to level a = oj, and level a = a;" corresponds to some hyper-Ackermannian complexity. 
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The problem PEP'"'^^ is just a reformulation of the problem (RAT n 1^) = 0. Indeed, 
relations of the form {{f{w),g{w)) \ w G L}, where L C S* ranges over regular languages 
and f,g over morphisms S* — )■ T* are precisely the relations in RAT2 [6, 30]. Hence, 

(RATn !^) = is decidable, with non-multiply-recursive complexity. 

Proposition 5.4 ([13]). (RATn 1^) = is decidable, non-multiply-recursive. 
We show that the lower bound already applies to regular relations. 

Theorem 5.5. The problem (REGni^) = is decidable, and its complexity is not bounded 
by any multiply-recursive function. 

The proof of the theorem above will be shown further down, after some preparatory 
definitions and lemmas are introduced. 

It is worth noticing that one cannot solve the problem (REGni^) = by simply reducing 
to nonemptiness of rational relations due to the following. 

Proposition 5.6. There is a binary regular relation R such that [R n 1^) is not rational. 

Proof. Let S = {a, 6}, and consider the following regular relation, 

i? = {(a™,6'^-a™') |m,m'GN}. 

Note that the relation i? n 1^ is then {(a™, 6™ • a™ ) | m, m' G N, ml > m}. We show that 
i? n C is not rational by means of contradiction. Suppose that it is, and let A be an NFA 
over {a,b,e} x {a,b,e} that recognizes RCiQ. Suppose Q is the set of states of A, and 
\Q\=n. 

Consider the following pair 

(a"+\6"+^-a"+i) G RnQ. 

Then there must be some u G ({a, b, e} x {a, b, e})* such that 

(7ri(n),7r2(n)) = (a"+\&"+i •«"+!) 

and u G C{A). Let p_A '■ [0--\u\] -^ Q he the accepting run of A on u, and let 1 < ii < • • • < 
in+i < 1^1 be such that TT2{u[ij]) = a for all j G [n + 1]. Clearly, among PA{h), ■ ■ ■ , PAi'i-n+i) 
there must be two repeating elements by the pigeonhole principle. Let 1 < ii < i2 ^ "^ + 1 
be such elements, where p^(%) = pAiij'i)- Hence u' = u[l..ij^ — 1] • u[ij2--] ^ -^(-4)) and 
therefore 

(^i(uO,vr2(n')) G i?nc. 

Notice that 7r2(u') = 6"+^ • a"+^~(^'2-^i). But by definition of i?n C we have that 7ri(u') = 
0"+-*^ with n + \ — {J2 — ji) > n + 1, which is clearly false. The contradiction comes from 
the assumption that i? n 1^ is rational. D 

As already mentioned, the decidability part of Theorem 5.5 follows from Proposition 5.4. 
We prove the lower bound by reducing PEP'"*^*' into (REG n 1^) = 0. 

This reduction is done in two phases. First, we show that there is a reduction from 
pgpreff jj-^^Q ^Yie problem of finding solutions of PEP*^^^ with a certain shape, which we 
call a strict codirect solutions (Lemma 5.7). Second, we show that there is a reduction 
from the problem of finding strict codirect solutions of a PEP*^*^^ instance into (REG n 

C) = (Proposition 5.8). Both reductions are elementary and thus the hardness result of 
Theorem 5.5 follows. 
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In the next section we define the strict codirect solutions for PEP''*^^, showing that we 
can restrict to this kind of solutions. In the succeeding section we show how to reduce the 

problem into (REG n C) = 0- 

5.2.1. Codirect solutions o/PEP'"'^^. There are some variations of the PEP*"^^ problem that 
result being equivalent problems. These variations restrict the solutions to have certain 
properties. Given a PEP*^"^^ instance {a, a', L), we say that w ^ L with \w\ = m is a codirect 
solution if there are (possibly empty) words vi, . . . ,Vm such that 

1. Vk Q a'{w[k]) for all 1 < A; < m, 

2. a{w[l..m]) = f 1 • • • Vm, and 

3. |o-(ti;[l../c])| > |f 1 • • • Vk\ for all 1 < fc < m. 
If furthermore 

4. |o"(ti;[l../c])| > |f 1 • • • ffel for all 1 < A; < ?n,, 

we say that it is a strict codirect solution. In this case we say that the solution w is witnessed 
by vi, . . . ,Vm- In [13] it has been shown that the problem of whether an instance of the 
pgpr-e^ problem has a codirect solution is equivalent to the problem of whether it has a 
solution. Moreover, it can be shown that this also holds for strict codirect solutions. 

Lemma 5.7. The problem of whether a PEP'"'^^ instance has a strict codirect solution is as 
hard as whether a PEP'"'^^ instance has a solution. 

Proof. We only show how to reduce from finding a codirect solution problem to finding a 
strict codirect solution problem. The other direction is trivial, since a strict codirect solution 
is in particular a solution. Let {a,a',L) be a PEP*^^^ instance, and w G L he a codirect 
solution with \w\ = m, minimal in size, and witnessed by vi, . . . , Wfc. Let A = {Q, S, qq, 5, F) 
be an NFA representing L, where \Q\ = n. Let p : [0..m] — )■ Q be an accepting run of ^ on w. 
Let < /ci < • • • < fet < m be all the elements of {s > : |(T(t(;[l..s])| = \vi- • • Vs\}. Observe 
that ki = 0, and kt = mhy condition 2. It is not difficult to show that by minimality of m 
there cannot be more than n indices. 

Claim 5.7.1. t < n. 

Proof. Suppose ad absurdum that t > n + 1. Then, there must be two ki < ki' such that 
p{ki) = p{k['). Hence, w' = w[l..ki] -tf [/c// + 1..] G L is also a codirect solution, contradicting 
that w is a minimal size solution. D 

Let L[q, q'] be the regular language denoted by the NFA (Q, S, q, 6, {(?'}). 

Claim 5.7.2. For every i <t, (a, a' , L[p{ki) , p{ki-^i)]) has a strict codirect solution. 

Proof. We show that for every i < t, w[ki + l../cj+i] is a solution for {a, a' , L[p(ki), p(fcj+i)]), 
witnessed by Wfc,+i, • • • , 'Wfc.+i • 

Clearly, condition 1 still holds. Further, since 

\a{w[l..ki])\ = \vi---Vk^\ and \a{w[l..ki+i])\ = \vi ■ ■ -Vki^^l, 

we have that \a{w[ki + l../cj+i])| = {vk^+i • • • Vki+i\ and then 

a{w[ki + l..ki+i]) = ffci+i---ffci+i, 

verifying condition 2. 
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Finally, by the fact that ki and fej+i are consecutive indices we cannot have some k' 
with ki + 1 < k' < /cj+i so that \a{w[ki + l..A;'])| = \vk^+i • • • Vk'\ since it would imply 
|o"(t(;[l../c'])| = \vi- • • Vk'\ and in this case k' > fej+i. Then, conditions 3 and 4 hold. D 

Therefore, we obtain the following reduction. 

Claim 5.7.3. (a, a', L) has a codirect solution if, and only if, there exist {gi, . . . , qt} C Q 
with qi = qo and qt £ F, such that for every i, {a,a' ,L[qi,qi^i]) has a strict codirect 
solution. 

This reduction being exponential is outweighed by the fact that we are dealing with a 
much harder problem. D 

With the help of Lemma 5.7 we prove Theorem 5.5 in the next section. 

5.2.2. Proof of Theorem 5.5. Since decidability follows from Proposition 5.4, we only show 
the lower bound. To this end, we show how to code the existence of a strict codirect solution 
as an instance of (REG n 1^) = 0. 

Proposition 5.8. There is an elementary reduction from the existence of strict codirect 
solutions of PEP'^^^ into (REG n C) = 0. 

Given a PEP*^^^ instance {a,a',L), remember that the presence of a strict codirect 
solution enforces that if there is a pair {u,v) = {a{w),a'{w)) with w £ L and u ^ v, it is 
such that for every proper prefix u' of u the smallest prefix v' of v such that u' C v' must be 
so that l^'l > \u'\. In the proof, we convert the rational relation R = {{a{w), (7'{w)) \ w G L} 
into a length-preserving regular relation R' over an extended alphabet P U {#}, defined as 
the set of ah pairs {u,v) G (P U {#})* x (P U {#})* so that \u\ = \v\ and (ur,fr) G R- If 
we now let R" to be the regular relation R' • {{e,v) \ v £ {#}*}, we obtain that: 

(i) ii w £ R" n !^ then w' £ RDQ, where w' is the projection of w onto P* x P*; and 
(ii) if there is some strict codirect solution w' £ RCiQ, then there is some w £ R" n 1^ 
such that w' is the projection of w onto P* x P*. 
Whereas (i) is trivial, (ii) follows from the fact that w' is a strict codirect solution. If 
w' = {u,v) £ R" , where f{w) = {u)r, g{w) = {v)ti the complication is now that, since 
u G PU {#}, it could be that u^v just because there is some i^ m u that does not appear 
in V. But we show how to build {u,v) such that whenever u[i] = # forces v[j] = # with 
j > i then we also have that u[j] = #. This repeats, forcing v[k] = ^ for some k > j and so 
on, until we reach the tail of v that has sufficiently many ^^'s to satisfy all the accumulated 
demands for occurrences of 7^. 

Proof of Proposition 5.8. Let {a,a',L) be a PEP'"'^^ instance. For every a G S, consider 
the binary relation Ra consisting of all pairs {u,u') G (P U {#})* x (P U {#})* such that 
ur = cr{a), Up = (T'(a) and |u| = \u'\. Note that Ra is a length-preserving regular relation. 
Let R' be the set of pairs (ui ■ ■ ■ Um, u'^- ■ ■ u'^) such that there exists w £ L where \w\ = m 
and {ui,u[) £ R^m for all i. Note that R' is still a length-preserving regular relation. 
Finally, we define R as the set of pairs {u,u' • u") such that {u,u') £ R' and u" £ {#}*• R 
is no longer a length-preserving relation, but it is regular. Observe that if i? n 1^ 7^ 0, then 
(cj, a' , L) has a solution. Conversely, we show that if {a, a' , L) has a strict codirect solution, 
then i? n C / 0. 
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Suppose that the PEP'"'^^ instance {a,a',L) has a strict codirect solution w £ L with 
\w\ = m, witnessed by vi, . . . ,Vm- Assume, without any loss of generahty, that a and a' 
are alphabetic morphisms and that m > 1. We exhibit a pair {u, u') G R such that u C u' . 
We define (u, u') = (ui • • • Um-, u[- • • u'^ • u'^_^i), where (uj, u'j) G Rw[i] for every i < m, and 
u'm+i ^ {t^}*- I^ order to give the precise definition of {u,u'), we need to introduce some 
concepts first. 

Let cr^{a) £ Tu{#} be # if a{a) = e, or a{a) otherwise; likewise for a'^. By definition 
of strict codirect solution, we have the following. 

Claim 5.8.1. a{w[l]) G T. 

Proof. Indeed, if cj(w[1]) 7^ F, then cj(w[l]) = e and |o"(tf [1])| = 0, and then condition 4 of 
strict codirectness stating that |(T(ttj[l])| > |t;i|, would be falsified. D 

Let us define the function g : [m] — )■ [m] so that g{i) is the minimum j such that 
vi---Vj = a{w[l..i]). Note that there is always such a j, since |o"(t(;[l..i])| > by Claim 5.8.1. 
Now we show some easy properties of g, necessary to correctly define the witnessing pair 
{u, u') G R such that u Q u' . 

Claim 5.8.2. g(i) > i for all 1 < i < m, and g{m) = m. 

Proof. Let g{i) = j and hence |cj(w[l..i])| = \vi---Vj\. First, notice that \vi---Vj\ = 
\a{w[l..i])\ > \vi---Vi\ by condition 3 of codirectness, and then that j > i- li i < m, 
|fi • • • Vi\ < \a{w[l..i])\ by condition 4, and thus |f 1 • • • Vi\ < \vi- ■ ■ Vj\ which implies i < j. 
li i = m, then j = i hy the fact that j > i = m. D 

Claim 5.8.3. g is increasing: g{i) > g{j) if i > j. 

Proof. Given m > i > j > 1, we have that 

|f 1 • • • Vg(i)\ = \a{'w[l..i])\ (by definition of g) 

> \a{'w[l..j])\ (since i > j) 

= \vi- • • Vgi^j^l (by definition of g) 

which implies that g{i) > g{j). D 

Observation 5.9. For all i < tti, if a{w[i]) G F then o"(tf [i]) = a'{w[g{i)]). 

The most important pairs of positions {i,j) G [m] x [m] that witness u CI u' , are those 
so that j = g{i) and cj(w[i]) ^ e. Once those are fixed, the remaining elements in the 
definition of g are also fixed. Let us call G to this set, and let us state some simple facts 
for later use. 

G = {{i,g{i)) G [m] x [m] \ a{w[i\) G F} 

Observation 5.10. For every {i,j), («', j') G G, if i 7^ i' then j 7^ j' . In other words, g 
restricted to {i \ a{w[i]) G F} is injective. 

Claim 5.10.1. Given i,j with {i,j) G G and i < m, then |cj(w[i..j])| > 2. 

Proof. This is because i < j by Claim 5.8.2, o"(ti;[j]) G F by definition of G, and o"(tf [j]) = 
a{w[g{i)]) G F by definition of g. D 
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Figure 1: Exemplary reduction from PEP'"'^^ to (REGni^) = 0, for the case a{w) = abacaba, 
a'^w) = aababacacbcba. 



Since our coding uses the letter ^ as some sort of blank symbol, it will be useful to 
define the factors ui,U2, ■ ■ ■ of u that contain exactly one letter from T. We then define Ui 
as the maximal prefix of Uj • • • Um belonging to the following regular expression: F • {i^}* ■ 

We are now in good shape to define precisely Uj,u'j for every j G [m]. For every j < m, 

• if (i,j) G G for some i, then 



Ui 



and Uj 



uA2..]; and 



• if there is no i so that {i,j) G G, then 

iuj,uj) = {a#{w[j]),a'^{w[j])). 

And on the other hand, {um,u'^) = {o'#{w[m]),a'n{w[m])) and u^+i = :^l"i"""'"l. Figure 1 
contains an example with all the previous definitions. Notice that the definition of Uj makes 
use of Uj and the definition of Uj seems to make use of Uj . We next show that in fact Uj 
does not depend on Uj, and that the strings above are well defined. 

Observation 5.11. For i < m, Ui is a prefix of Uj • • • UgU\i. 

Proof. By Claim 5.8.2 and Claim 5.10.1, a{w[i..g{i)]) contains at least two elements and 

hence Ui- ■ ■ u 

cannot contain Uj • • • w^(j)_i y^g 

By the above Observation 5.11, to compute Ui we only need Uj's and u'-'s with j < i, 
and hence (u,u') is well defined. 



*g(j) contains at least two elements from F, namely Ui[l] and «g(j)[l]. Then, 



j(jJl]) as a prefix. 



Ui 

D 



Observation 5.12. All the Uj's, u^'s and Uj's are of the form a • # • • • # or #•••#, for 
aG F. 

From the definition of (u, u') we obtain the following. 

Observation 5.13. For every n < m, 
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(1) |(ui---n„)r| = {i e [n] \ 3j.{i,j) £ G} = \a{w[l..n])\, and 

(2) IK • • • Orl = {j G N I 3i.ii,j) gG} = \a'iw[l..n])\- 

We now show that {u, u') G R and that u Q u' . 

Claim 5.13.1. {u,u') G R. 

Proof. Note that Ui = a#{w[i]) for all i and then (uj)r = cr{w[i]). 

We also show that {u[)r = cr'{w[i]). If u'- is such that there is no {i,j) G G, or j = m, 
then it is plain that (n')r = cr'iuj[j]) by definition of u',. On the other hand, if u', = Ui for 
(i,j) G G, then 

{u'j)r = iui)r = iui)r = {ui[M)r (by Observation 5.12) 

= icr{w[i]))r (by def. of Ui) 

= a{w[i]) (since (T(ti;[i]) G T by def. of G) 

= a'{w[g{i)]) = a'{w[j]). (by Observation 5.9) 

Thus, every {ui,Vi) with i < m is such that {ui)r = (j{w[i]) and (n^r = f^'iwli]), 
meaning that {ui,Vi) G Rw[{\ for every i < m. Hence, we have that (ui • • • Um, u[- • • u'„^) G R' 
and since u'^j^^ G {#}*, {u,u') £ R. D 

Next, we prove that u Q u' , but before doing so, we need an additional straightforward 
claim. Let {ii < ■ ■ ■ < i\G\} = {i I {h9i'>-)) ^ G}. Note that ii = 1 by Claim 5.8.1. 

Claim 5.13.2. ij+i < g{ij) 

Proof. By means of contradiction, suppose g{ij) < ij+i- Then, 

\aiw[l..g{ij)])\ = \{i G [g{ij)] \ 3j.{i,j) G G}| (by Observation 5.13.1) 

= l{^ G [g{ij)] I 3j.{i,j) G G}| (since g{ij) < ij+i) 

= \a'{w[l..g{ij)])\. (by Observation 5.13.2) 

In other words, there is some k < m such that |cr(tf [l..fc])| = |(T'(ti;[l..A;])|. This is in 
contradiction with condition 4 of strict codirectness. Hence, g{ij) > ij+i- D 

Claim 5.13.3. u Q u'. 

Proof. We factorize u = ui • • • u\q\ and we show that each iii is a substring of u' that appears 
in an increasing order. 

We define Uj = ui- ■ ■ -Ui,.^^ \ for every j < |G|, and ii|(^| = Ui.^. ■ ■ -Um- Hence, the 
Uj's form a factorization of u. Indeed, this is the unique factorization in which each iii is of 
the form b ■ # ■ ■ ■ # for b gT. 

For every ?' < IGI, we show that Uj Q u' ,■ ■^. 

J gytj) 

Uj = Ui- ■ ■ -ttjy^jj-l 

E Ui^ ■ ■ ■ Ug(^i^yi (by Claim 5.13.2) 



C Ui- (by Observation 5.11) 

= ^g-i(g(j)) (by Observation 5.10) 

= «g(i,) (by def. of u') 

On the other hand, u\g\ E u',- -, • w'm+i = u'^ • u'^+i. By Claim 5.8.3, g is increasing. 

Hence, u ^ u' . D 
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By Claims 5.13.1 and 5.13.3, we conclude that i? n 1^ 7^ 0. D 

5.2.3. Subsequence-closed relations. The next question is how far we can extend the decid- 
ability of (RAT n 1^) = 0. It turns out that if we allow one projection of a rational relation 
to be closed under taking subsequences, then we retain decidability. 

Let i? C S* X r* be a binary relation. Define another binary relation 

-Rjz = {{ujw) I u Q u and {u , w) £ R for some u} 

Then the class of subsequence-closed relations, or SCR, is the class {i?c | R G RAT}. Note 
that the subsequence relation itself is in SCR, since it is obtained by closing the (regular) 
equality relation under subsequence. That is, 1^ = {(w,w) \ w G S*}|z. Not all rational 
relations are subsequence-closed (for instance, subword is not). 

The following summarizes properties of subsequence-closed relations. 

Proposition 5.14. 

(1) SCR C RAT. 

(2) SCR 2 REG and REG g SCR. 

(3) A relation R is in SCR iff {w (^ w' \ {w,w') G R} is accepted by an NFA A = 
{Q,T,± X T,±,qo,5, F) such that {q,{a,b),q') G 6 implies ((7, (_L, 6), g') G 6 for all 
q,q' £ Q and a,b £ T,±. We call an automaton with such property a subsequence- 
closed automaton. 

Note that (3) is immediate by definition of R^, (1) is a consequence of (3), and (2) is 
due to the fact that 1^ is not regular and that, for example, the identity {{u,u) \ u £ T,*} is 
not a subsequence-closed relation. 

When an SCR relation is given as an input to a problem, we assume that it is represented 
as a subsequence-closed automaton as defined in item (3) in the above proposition. 

Note also that (SCR n SCR) = is decidable in polynomial time: if R,R' G SCR and 
i?n i?' 7^ 0, then (e, w) £ RCiR' for some w, and hence the problem reduces to simple NFA 
nonemptiness checking. 

The main result about SCR relations generalizes decidability of (RAT n 1^) = 0. 

Theorem 5.15. The problem (RAT n SCR) = is decidable, with non-mutiply recursive 
complexity. 

In order to prove Theorem 5.15 we use Lemmas 5.16 and 5.17, as shown below. But 
first we need to introduce some additional terminology. We say that (^O)-^i) is an instance 
of (RAT n SCR) = over S, P if ^1 is a subsequence-closed automaton over T,± x r_L, and 

^0 is a NFA over S_l x r_L. Given a (RAT n SCR) = instance (^O) -^1) over S, F, we say 
that {wi,W2) is a solution if wi,W2 G (S_l x r_L)*, wi G C{Ai),W2 G C{Ao). We say that a 
solution {wo,wi) of an instance {Ao,Ai) over T,,T is synchronized if 7r2(tt;o) = '^2{wi)- We 

write (RAT n SCR)*^^"^ = for the problem of whether there is a synchronized solution. 

Lemma 5.16. There is a polynomial-time reduction from the problem (RAT n SCR) = 
into (RAT n SCR) 



syn 



7 
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Proof. We show that (RAT n SCR) = is reducible to the problem of whether there exists 
a synchronized solution of (RAT n SCR) = 0. Suppose that (^o^-^i) is an instance of 
(RAT n SCR) = over the alphabets S,r. Consider the automata ^q,^']^ as the result of 
adding all transitions {q, (_L, _L), q) for every possible state q to both automata. It is clear 
that the relations recognized by these remain unchanged, and that Aq is still a subsequence- 
closed automaton. Moreover, this new instance has a synchronized solution if there is any, 
as stated in the following claim. 

Claim 5.16.1. There is a synchronized solution for (^g,^'^) if, and only if, there is a 
solution for {Ao,Ai). 

The 'only if part is immediate. For the 'if part, let {wqjWi) be a solution for {Aq,Ai). 
Let Wo = vjQ^i ■ ■ ■ vjQ^n, wi = wi^i • • • wi^n be factorizations of wq and wi such that for every 
i G {0, 1}, vr2(tfj,i) is in {J-}*; and for each j > l,i € 0, 1, Tr2{'Wij) is in T • {J-}*. It is plain 
that there is always such factorization and that it is unique. 

For every j G [n], we define Wq, = wqj ■ (_L,_L)'^ and w[j = wij ■ (_L,_L)^^, with 
k = \wij\ — \woj\, where we assume that (_L,_L)™' with m < is the empty string. We 
define w'q = w'q i • • • w'q ^, w'l = w[ i • • • w'l ^. Note that {w'q, w'l) is a solution of {Aq, A'l) 
since it is the result of adding letters (_L, _L) to {wo,wi), which is also a solution of {Aq, A'l). 
We have that '7T2{wq) = -K2{w'i), and therefore that {w'qjw'i) is a synchronized solution for 
{A'q,A[). d 

Lemma 5.17. There is a polynomial-time reduction from (RAT n SCR)*^^'^ = into (RATn 



Proof. The problem of finding a synchronized solution for Aq , Ai can be then formulated as 
the problem of finding words f, uo, ni G S^ with |t;| = |no| = \ui\, so that {uq0v,ui0v) is a 
solution. We can compute an NFA ^ over S^ X r_L from^Oj-^i) such that {uq,ui,v) G C{A) 
if, and only if, uq v £ C{Ai) and ui (5D f G C{Ao). Consider now an automaton A' over 
S^ such that C{A') = {{uq,ui) \ 3v {uq,ui,v) G C{A)}. It corresponds to the rational 
automaton of the projection onto the first and second components of the ternary relation 
of ^, and it can be computed from A in polynomial time. We then deduce that there exists 
uq<SiUi G J~'{A') so that (no)s E (wi)s if, and only if, there is w G F^ with \v\ = \uq\ = \ui\ 
so that uo fS" V G C{Ao) and ui (X" v G C{Ai), where (no)E E (^^1)2- But this condition is in 
fact equivalent to i?o H i?i / (where Ri = {{{u)y:, {v)y;) \ m^ v £ C{Ai)}), since 

• if ((«i)e, {v)t.) G Ri and {uq)y, Q {ui)t,, then ((u,o)s, {v)j:) G Ri (since Ri G SCR) 
and hence {{uq)j], (w)s) G i?o H Ri; and 

• if i?o n i?i / 0, then there exists a synchronized solution {uq (X" v , ui (S" v) of ^o, Ai; 
in other words, there are \v\ = \uq\ = \ui\ so that uq tX" f G £(^0), ^i (g) w G C{Ai), 
and (no)s = (^ii)s- 

We have thus reduced the problem to an instance of (RAT n E) = 0: whether there is {u, v) 
in the relation denoted by A' so that n C v. □ 

Proof of Theorem 5.15. The decidability part of Theorem 5.15 follows as a corollary of 
Lemmas 5.16 and 5.17, and Proposition 5.4. Of course the complexity is non-multiply- 
recursive, since the problem subsumes (REG n E) = of Theorem 5.5. D 
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Coming back to graph logics, we obtain: 

Corollary 5.18. The complexity of evaluation of ECRPQ(C) queries is not bounded by 
any multiply-recursive function. 

Another corollary can be stated in purely language-theoretic terms. 

Corollary 5.19. Let C be a class of binary relations on S* that is closed under intersection 
and contains REG. Then the nonemptiness problem for C is: 

• undecidable if ^ or ^suff is in C; 

• non- multiply-recursive if 1^ is in C. 

5.3. Discussion. In addition to answering some basic language-theoretic questions about 
the interaction of regular and rational relations, and to providing the simplest yet problem 
with non-multiply-recursive complexity, our results also rule out logical languages for graph 
databases that freely combine regular relations and some of the most commonly used ra- 
tional relations, such as subword and subsequence. With them, query evaluation becomes 
either undecidable or non-multiply-recursive (which means that no realistic algorithm will 
be able to solve the hard instances of this problem) . 

This does not yet fully answer our questions about the evaluation of queries in graph 
logics. First, in the case of subsequence (or, more generally, SCR relations) we still do not 
know if query evaluation of ECRPQs with such relations is decidable (i.e., what happens 
with GenInt5'(REG) for such relations S). 

Even more importantly, we do not yet know what happens with the complexity of 
CRPQs (i.e., GenInt5(REC)) for various relations S. These questions are answered in the 
next section. 

6. Restricted logics and the generalized intersection problem 

The previous section already ruled out some graph logics with rational relations as 
either undecidable or decidable with extremely high complexity. This was done merely by 
analyzing the intersection problem for binary rational and regular relations. We now move 
to the study of the generalized intersection problem, and use it to analyze the complexity 
of graph logics in full generality. We first deal with the generalization of the decidable case 
(SCR relations), and then consider the problem GenInt5(REC), corresponding to CRPQs 
extended with relations S on paths. 

6.1. Generalized intersection problem and subsequence. We know that (REGni^) = 
is decidable, although not multiply-recursive. What about its generalized version? It turns 
out it remains decidable. 

Theorem 6.1. The problem GenIntc(REG) is decidable. That is, there is an algorithm 
that decides, for a given m-ary regular relation R and / C [m]^, whether i? Cl/ 1^ 7^ 0. 

Proof. Let A; G N, / C [fc] x [k] and R G REGjt be an instance of the problem. Let us define 
G = {{wi, . . . , Wk) I V(i, j) £ I,Wi Q Wj}. We show how to compute if i? fl G is empty or 
not. Let A = {Q, {'E±)'',qo,6,F) be a NFA over (S_l)^ corresponding to R, for simplicity 
we assume that it is complete. Remember that every w £ C{A) is such that iTilw) is in 
S*; {_L}* for every i e [k]. 
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Given u,v £ S*, we define u\v as u[i..], where i is the maximal index such that 
u[l..i — 1] Q V. In other words, u\v is the result of removing from u the maximal prefix 
that is a subsequence of v. 

We define a finite tree t whose every node is labeled with 

• a depth n > 0, 

• k words wi, . . . jWk £ S" , 

• for every (i, j) G /, a word aij £ S*, and 

• a state q £ Q. 

For a node x we denote these labels by x.n, x.wi, . . . ,x.Wk, x.Uij for every (i, j) £ I 
and x.q respectively. The tree is such that the following conditions are met. 

• The root is labeled by x.n = 0, x.wi = • • • = x.Wk = £, for very {i,j) £ I, x.aij = e, 
and x.q = qo. 

• A node x has a child y in t if and only if 

— y.n = x.n + 1, 

— x.Wi = y.Wi[l..y.x — 1] for every i £ [k], 

— there is a transition {x.q,a,y.q) £ 6 with a = (y.ifj[y-^])ie[fc]i a-nd 

— y.aij = {wi)j: \ {wj)y; for every (i,j) £ I. 

• A node a; is a leaf in t if and only if is final or saturated (as defined below). 

A node x is final if x.q £ F and x.aij = £ for all {i,j) £ I. It is saturated if it is not 
final and there is an ancestor y ^ x such that y.q = x.q and y.Oij Q x.aij for all {i,j) £ I. 

Lemma 6.2. The tree t is finite and computable. 

Proof. The root is obviously computable, and for every branch, one can compute the list of 
children nodes of the bottom-most node of the branch. Indeed these are finite and bounded. 
The tree t cannot have an infinite branch. If there was an infinite branch, then as a result 
of Higman's Lemma cum Dickson's Lemma (and the Pigeonhole principle) there would be 
two nodes x ^ y, where x is an ancestor of y, x.q = y.q, and for all {i,j) £ I, x.aij E H-dij. 
Therefore, y is saturated and it does not have children, contradicting the fact that x and y 
are in an infinite branch of t. Since all the branches are finite and the children of any node 
are finite, by Konig's Lemma, t is finite, and computable. D 

Lemma 6.3. If t has a final node, i? n G 7^ 0. 

Proof. If a leaf x is final, consider all the x.n ancestors of x: xq, • . . , x„_i, such that Xi.n = i 
for every i £ [n—l]. Consider the run p : [0..x.n] — )■ Q defined as p{x.n) = x.q and p{i) = Xi.q 
for i < x.n. It is easy to see that p is an accepting run of .A on x.wii^. . .®x.Wk and therefore 
that {{x.wijY,, . . . , {x.Wk)s) £ R- On the other hand, for every (i, j) £ I, {x.Wi)^, E {x.Wj)^, 
since aij = e. Hence, ((x.u'i)s, • • . , {x.WkjT) £ G and thus RCiG j^ 9. D 

Lemma 6.4. If all the leaves of t are saturated, i? n G = 0. 

Proof. By means of contradiction suppose that there is w = wi Ci^ ■ ■ ■ Wk £ (S^l)* such 
w £ C{A) through an accepting run p : [0..n] — )■ Q, and for every {i,j) £ I, (tyi)s Q (^i)s- 
Let \w\ = n be of minimal size. 

By construction oft, the following claims follow. 

Claim 6.4.1. There is a maximal branch xq, . . . ,Xm in t such that x^.n = i, xi.Wj = 
Wj[l..i], Xi.q = p{i) for every i £ [0..m] and j £ [k]. 
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Claim 6.4.2. For every (, G [Q..m\ and (i, j) G /, 

Xi.a^j ■ {w^[e + l..])s Q {wj[£ + l..])s, (6.1) 

iwi[l..i - \xi.a^,\])s C iwj[l..i]h. (6.2) 

Since we assume that all the leaves of t are saturated, in particular Xm is saturated and 
there must be some m' < m such that Xm and Xm' verify the saturation conditions. 
Consider the following word. 

w' = w[l..m'] • wlm + 1..] 

The run p trimmed with the positions [m' + l..m] is still an accepting run on w' (since 
p{m') = p{m)), and therefore ((vri(ii;'))s, . . . , (vrfc(ii;'))s) G R- 

For an arbitrary {i,j) G /, we show that (vrj(w'))2 Q (7i"j(^'))s- First, note that by 
(6.2) we have that 

{TTi{w')[l..m' - \Xm'.aij\])^ = {Wi[l..m' - \Xm'.aij\])Y; 

Q {wj[l..m'])s (by (6.2)) 

Since Xm' and Xm verify the saturation conditions, x^'-O-ij E Xm-dij. Therefore, 

{'Ki{w')[m - \xm'-aij\ + l..])s = {'Ki{w')[m - \xmt.aij\ + l..m'])s • {■Ki{w')[m + 1..])e 

= Xm'-aij ■ {wi[m + 1..])e 

C Xm-aij ■ iwi[m + l..])s (since Xm'-aij Q Xm-aij) 

C(«;,.[m + l..])s (by (6.1)) 

= {'Kj{w')[m' + l..])s 

Hence, we showed that there are some £,£' such that {TTi{w')[l.l\)Y, E (^i('W^')[l--^'])s and 
{'Ki{w')[i + l.])Y. E (7rj(tt;')[^' + l..])s, ioTl = m' -\x^>.aij\ and f = m' . Thus, (7ri(tt;'))s E 
(vrj(u;'))s- 

This means that {{tti{w'))y.i ■ ■ ■ , (vrfc(ty'))s) ^ ^ ^^^ thus ((vri(tt;'))2, . . . , (7rfc(ty'))2) ^ 
RDG. But this cannot be since jty'l < |ti;| and w is of minimal length. The contradiction 
arises from the assumption that RoG ^ f/i. Then, RCiG = 9. D 

Hence, by Lemmas 6.2, 6.3 and 6.4, i? n G 7^ if and only if t has a final node, which 
is computable. D 

Corollary 6.5. The query evaluation problem for ECRPQ(C) queries is decidable. 

Of course the complexity is extremely high as we already know from Corollary 5.18. 

Note that while the intersection problem of 1^ with rational relations is decidable, as is 
GenIntc(REG), we lose the decidability of GenIntc(RAT) even in the simplest cases that 
go beyond the intersection problem (that is, for ternary relations in RAT and any / that 
does not force two words to be the same). 

Proposition 6.6. The problem (RAT Pi/ C) = is undecidable even over ternary relations 
when / is one of the following: 

(1) {(1,2),(2,3)}, 

(2) {(1,2), (1,3)}, or 

(3) {(1,2), (3, 2)}. 



26 P. BARCELO, D. FIGUEIRA, AND L. LIBKIN 



Proof. The three proofs use a reduction from the PCP problem. Recall that this is defined 
as follows. The input are two equally long lists ui,U2, ■ ■ ■ ,Un and vi,V2, ■ ■ ■ ,Vn of strings 
over alphabet S. The PCP problems asks whether there exists a solution for this input, that 
is, a sequence of indices 11,12, ■■■ ,ik such that I < ij < n (1 < j < k) and Ui^^Ui^ " '"^ik ~ 

^il ^J2 ' ' ' ^*fc • 

(1) {(1,2), (2,3)}: The proof goes by reduction from an arbitrary PCP instance given by 
lists ui, . . . ,Un and vi, . . . ,Vn of strings over alphabet S. The following relation 

R = {{uii ■■■Ui^,Vi^ ■■■Vi^,Uii ■■■Uim) \m£N and ii,...,i„ G [n]} 

is rational and Rr]{{x,y,z) | a; C y C z} is non-empty if and only if the instance has a 
solution. 

(2) {(1, 2), (1, 3)}: The proof again goes by reduction from an arbitrary PCP instance given 
by lists ui, . . . ,Un and vi, . . . ,Vn of strings over alphabet S. For simplicity, and without 
any loss of generality, we assume that l^il, |i^i| < 1 for every i. Let T, = {a\ a £ S}, and for 
every w = ai • • • ac £ Ti* , let w = ai • • • &£. Consider 

R = {{x,y,z) \m£N,ii,...,im £ [n], wi,w[, . . . ,Wm+i,w'„^^i G S*, 

X = Ui-iVi-^Ui2Vi2 • • 'Uim'^imJ 

y = w[ui,w'2 ■ ■ ■ w'^Ui^w'^^i, 

Z = WlVi^W2 ■ ■ ■ WmVi^Wm+l} 

which is a rational relation. Note that there is some (x, y,z) £ R with x Q y ii and only if 
there is some Vji • • • fj„ E ^^n • • • Ui^. Similarly for x Q z. Therefore, there is (x, y,z) £ R 
with X Q y, X Q z ii and only if Vj^ • • • Vi^ = ui^- ■ ■ Ui^ for some choice of ii, . . . , im- 

(3) {(1, 2), (3, 2)}: This is similar to (2), but this time we consider the following rational 
relation. 

R = {{x,y,z) \m£N,ii,...,im £ [n], wi,w'i, . . . ,Wm+i,w'^^i £ S*, 

y ^ ^n'yn^J2^«2 ■ ■ ''^im'^im.l 

X = w[ui,w'2 ■ ■ ■ w'^Ui^w'^^i, 
Z = WlVi^W2 ■ ■ ■ WmVi^Wm+l} 

Analogously as before, there is (x, y,z) £ R with x Q y, z Q y ii and only if the PCP 
instance has a solution. D 



6.2. Generalized intersection problem for recognizable relations. We now consider 
the problem of answering CRPQs with rational relations S, or, equivalently, the problem 
GenInt5(REC). Recall that an instance of such a problem consists of an m-ary recognizable 
relation R and a set / C [tti]^. The question is whether RCij S 7^ 0, i.e., whether there exists 
a tuple {wi, . . . ,w„i) £ R so that {wi,Wj) £ S whenever (i, j) £ I. It turns out that the 
decidability of this problem hinges on the graph-theoretic properties of /. In fact we shall 
present a dichotomy result, classifying problems GenInt5(REC) into PSPACE-complete and 
undecidable depending on the structure of /. 

Before stating the result, we need to decide how to represent a recognizable relation R. 
Recall that an m-ary R £ REG is a union of relations of the form Li x . . . x Lm, where each 
Li is a regular language. Hence, as the representation of R we take the set of all such LjS 
involved, and as the measure of its complexity, the total size of NFAs defining the LjS. 
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With a set / C [m]^ we associate an undirected graph G/ whose nodes are 1, . . . ,m 
and whose edges are {i,j} such that either (i,j) G / or (j, i) G /. We call an instance of 

(REC ri/ 5) = acyclic if G/ is an acyclic graph. 
Now we can state the dichotomy result. 

Theorem 6.7. 

• Let 5 be a binary rational relation. Then acyclic instances of GenInt5(REC) are 



decidable in PSpace. Moreover, there is a fixed binary relation 5*0 such that the 
problem (REC fl/ ^o) = is PSPACE-complete. 
• For every / such that Gj is not acyclic, there exists a binary rational relation S such 
that the problem (REC fl/ S") = is undecidable. 

Proof. For PSPACE-hardness we can do an easy reduction from nonemptiness of the in- 
tersection of m given NFA's, which is known to be PSPACE-complete [26]. Given m 
NFAs Ai, . . . ,Am-, define the (acyclic) relation / = {{i,i + 1) | 1 < « < rn}. Then 
P|j£(^i) is nonempty if and only if rij-^l-^j) ^l Sq / 0, where 5*0 is the regular relation 
{(w, w) \ w ^ S*}. 

For the upper bound, we use the following idea: First we show how to construct, in 
exponential time, the following for each TTi-ary recognizable relation R, binary rational 
relation S and acyclic / C [m]'^: An m-tape automaton A{R,S,I) that accepts precisely 
those w = {wi, . . . ,Wm) G (S*)™ such that w G R and {wi,Wj) G S, for each {i,j) G /. 
Intuitively, A{R, S, I) represents the "synchronization" of the transducer that accepts R 
with a copy of the 2-tape automaton that recognizes S over each projection defined by the 
pairs in /. Such synchronization is possible since I is acyclic. Hence, in order to solve 
GenInt5(REC) we only need to check A{R, S, I) for nonemptiness. The latter can be done 
in PSpace by the standard "on-the-fly" reachability analysis. We proceed with the details 
of the construction below. 

Recall that rational relations are the ones defined by n-tape automata. We start by 
formally defining the class of n-tape automata that we use in this proof. An n-tape au- 
tomaton, n > 0, is a tuple A = {Q, S, Qq, 6, F), where Q is a finite set of control states, S 
is a finite alphabet, Qo '^ Q is the set of initial states, (5 : Q x (S U {e})" -^ 2<3^(Wu{M}) ^g 
the transition function with e a symbol not appearing in S, and F C Q is the set of final 
states. Intuitively, the transition function specifies how A moves in a situation when it is in 
state q reading symbol a G S": If {q',j) G 5{q, a), where j G [n], then A is allowed to enter 
state q' and move its j-th head one position to the right of its tape. If {q' , [n]) G S{q,a) 
then A is allowed to enter state q' and move each one of its heads one position to the right 
of its tape. 

Given a tuple w = (tfi,...,w„) G (S*)" such that Wi is of length pi > 0, for each 
1 < i < n, a run of A over w is a sequence qo Pq <?i A • • • Qk-i -Pfc-i Qk, for k > 0, such that: 

(1) qi G Q, for each < i < k, 

(2) qo G Qo, 

(3) Pi is a tuple in ([pi] U {0}) x • • • x ([p„] U {0}), for each < i < k — 1 (intuitively, 
the Pj's represent the positions of the n heads of A at each stage of the run. In 
particular, the j-th component of Pi represents the position of the j-th. head of A 
in stage i of the run), 
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(4) Pq = (61, . . . ,bn), where bi := if Wi is the empty word e (that is, pi = 0) and 
hi := 1 otherwise (that is, the run starts by initiahzing each one of the n heads to 
be in the initial position of its tape, if possible), 

(5) Pk-i = {pi, ■ ■ ■ ,Pn), that is, the run ends when each head scans the last position of 
its head, and 

(6) for each < i < A; — 1, if Pj = (ri, . . . , r„) and 

((7ri(w)))[ri], ... ,(7r„(u;))[r„]) = (ai,...,a„), 

where we assume by definition that wlO] = e, then 5{qi, (ai, . . . , a„)) contains a pair 
of the form (gj+i, j) such that: 

(a) Hi < k — 1 then j £ [n] and Pj+i is the tuple (ri, . . . , rj_i, rj + 1, Vj^i, . . . , rn). 
In such case we say that (gj+i,Pi+i) is a valid transition from {qi,Pi) over id 
in the j-th head, and 

(b) if i = /c — 1 then j = [n] . This is a technical condition that ensures that each 
head of A should leave its tape after the last transition in the run is performed. 

That is, each run is forced to respect the transition function 6 when the n-tape 
automaton A is in state q reading the symbols in the corresponding positions of its 
n heads. Further, the positions of the n heads are updated in the run also according 
to what is allowed by 5. Notice that each transition in a run moves a single head, 
except for the last one that moves all of them at the same time. 

The run is accepting if qk G F (that is, A enters an accepting state after each one of its 
heads scans the last position of its own tape). 

Each n-tape automaton A defines the language L{A) C (S*)" of all those iju = {wi, . . . , 
Wn) G (S*)" such that there is an accepting run of A over it). It can be proved with 
standard techniques that languages defined by n-ary rational relations are precisely those 
defined by n-tape automata. Notice that there is an alternative, more general model of 
n-tape automata that allows each transition to move an arbitrary number of heads. It is 
easy to see that this model is equivalent in expressive power to the one we present here, 
as transitions that move an arbitrary number of heads can easily be encoded by a a series 
of single-head transitions. We have decided to use this more restricted version of n-tape 
automata here, as it will allow us simplifying some of the technical details in our proof. 

Now we continue with the proof that the problem GenInt5(REC) can be solved in 
PSpace if / is acyclic (that is, it defines an acyclic undirected graph). The main technical 
tool for proving this is the following lemma: 

Lemma 6.8. Let R be an m-ary relation in REC, S a binary rational relation, and / a 
subset of [m] x [m] that defines an acyclic undirected graph. It is possible to construct, 
in exponential time, an m-tape automaton A{R, S, I) such that the language defined by 
A{R,S,I) is precisely the set of words w = (wi, . . . ,Wm) G (S*)™ such that w £ R and 
{wi,Wj) G S for all (i, j) G /. 

We start by proving the lemma. The intuitive idea is that A{R, S, I) is an m-tape 
automaton that at the same time recognizes R and represents the "synchronization" of the 
\I\ copies of the 2-tape automaton S over the projections corresponding to the pairs in /. 
Since / is acyclic, such synchronization is possible. 
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Assume that |/| = i. Let ti, . . . , t^ be an arbitrary enumeration of the pairs in /. Also, 
assume that the recognizable relation R is given as 

[JMi, X ••• xMi^, 

i 

where each A/i is an NFA over S (without transitions on the empty word). Assume that 
the set of states of A/i is C/j., its set of initial states is U^ and its set of final states is Uf . 
Further, assume that the 2-tape transducer S is given by the tuple {Qs,'^,Q%Ss,Qg), 
where Qs is the set of states, the set of initial states is Q^, the set of final states is Qg, 
and 6s ■■ Qs X {^^ {e}) x (S U {e}) -^ 2'3^«i'2}u{{i,2}}) jg ^^e transition function. We 
take |/| = i disjoint copies Si, . . . ,Si of S*, such that Si, for each 1 < i < ^, is the tuple 
{QSi, S, Q^., 6 Si, Qs)- Without loss of generality we assume that if U = {j,j') € [m] x [m] 
then 5si is a function from Qs, x (S U {e}) x (E U {e}) into 2'3^«Jj'}uii^'^"»). We can do 
this because / is acyclic, and hence j / /. 

The m-tape automaton A{R, S, I) is defined as the tuple {Q, S, Qq, 5, F), where: 

(1) The set of states Q is 

[j {Ui, X ■ ■ ■ X Ui^ X Qs^ X ■ ■ ■ X Qs,) . 

i 

(2) The initial states in Qq are precisely those in 

\J{Kx...xUlxQlx...xQl). 

i 

(3) The final states in F are precisely those in 

[j{u[,x...xU[^xQ^s,x---xQl). 

i 

(4) The transition function (5 : Q x (S U {e})™ -^ 2'3^(Hu{M}) jg defined as follows on 
state q £ Q and symbol a G (S U {e})™. Assume that q = (ttij, . . . , Ui^, qi, . . . , qi), 
where Uij £ Ui- for each I < j < m, and qj £ Qs for each 1 < j < i. Further, 
assume that a = (ai, . . . , am), where Uj £ (SU{e}) for each I < j < m. Then 6{q, a) 
consists of all pairs of the form ((^i^^, . . • , u[^, q'^, . . . , q'^), j ), for j £ [m], such that: 

(a) u'^ = Uij^ for each k £ [m] \ {j}, and there is a transition in A/i- from ui- into 
u'^^ labeled a^; and 

(b) for each 1 < fc < £, if t^ is the pair {ki, ^2) £ [m] x [rri\ then the following holds: 
(1) If J {ki,k2] then qk = q'^, and (2) if j £ {ki,k2} then (q'^J) belongs to 
<J5fe(%, (0^1,0^2)), 

plus all pairs of the form [{u'^^, . . . , u[^, q[, . . . ,q'^), [m] ) such that: 

(a) for each 1 < k < m there is a transition in Ai^. from Uj^ into u[ labeled a^; 
and 

(b) for each 1 < A; < ^, if t^ is the pair {ki,k2) £ [m] x [m] then ((Z^, {{^i, ^2}}) 
belongs to ^s^fe, (afci,^^))- 

Intuitively, 5 defines possible transitions of A{R, S, I) that respect the transition 
function of each one of the copies of S over its respective projection. Further, while 
scanning its tapes the automaton A{R, S, I) also checks that there is an i such that 
for each 1 < j < m the j'-th tape contains a word in the language defined by A/^ . . 
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Clearly, A{R, S, I) can be constructed in exponential time from R, S and I. Notice, however, 
that states of A{R, S, I) are of polynomial size. 

We prove next that for every w = (wi, . . . ,Wm) G (S*)™ it is the case that w is 
accepted by A{R, S, I) if and only if w belongs to the language of R and {wi,Wj) G S, for 
each {i,j) G /. 

=^) Assume first that w = {wi, . . . ,Wm) £ (S*)™' is accepted by A{R, S,I). It is easy to 
see from the way A{R, S, I) is defined that, for some i, the projection of the accepting run 
of A{R, S, I) on each 1 < j < m defines an accepting run of Mi ■ over Wj. Further, for each 
(j. A;) G / it is the case that the projection of the accepting run of A{R, S, I) on (j, k) defines 
an accepting run of S over {wj^ Wk)- We conclude that w belongs to the language of R and 
[wj, Wk) G 5", for each (j. A;) G /. 

<;^) Assume, on the other hand, that w = {wi, . . . ,Wm) G (S*)™ belongs to the language 
of R and (wi, Wj) G S, for each (i, j) G /. Further, assume that the length of Wi is pi > 0, 
for each 1 < i < m. We prove next that w is accepted by A{R, S, I). 

Since w £ R it must be the case that w is accepted by A/ij x • • • x A/i„, for some i. Let 
us assume that 

Pij ■■= Uijfl (1) Ui,,i (2) • • • Uij,p,-i (Pj) Ui^,pj 
is an accepting run of the 1-tape automaton Mi^ over Wj, for each 1 < j < m. Since for 
every tj (1 < j < i) of the form (k, k') G [m] x [ttt,] it is the case that {wk,Wk') G S, there is 
an accepting run 

Xj := qjfi Pjfi Qj^i Pj^i ■ ■ ■ Qj^rj Pj,r, Q'i,r,+1 

of Sj over {wk , Wk' ) • We then inductively define a sequence 

^0 PoqiPi ■■■ 

where each qj is a state of Q and each Pj is a tuple in {[pi] U {0}) x • • • x {[pm] U {0}), as 
follows: 

(1) qo := (uij^fl, ■■■, Ui^fl, qifi, ■■■, qifi)- 

(2) Pq = (bi, . . . , bm), where bi := if Wi is the empty word and 6j := 1 otherwise. 

(3) Let j > 0. Assume that qj = {ui-^, . . . ,Ui^,qi, . . . ,qe), where each Uj^ is a state in 
Mif, and each qk is a state in Sk, and that Pj = (ri, . . . , Vm) G {[pi] U {0}) x • • • x 

([Pm]U{0}). 

If for every 1 < fc < m it is the case that r^ = pk then the sequence stops. 
Otherwise it proceeds as follows. 

If for some 1 < A; < m it is the case that Uif,{rk) is not a subword of the accepting 
run Pij.,^ or that for some 1 < k < i such that tk = {ki, ^2) G [m] x [m] it is the 
case that 9fc (?'fci ) ^A:2 ) i^ ^^^ ^ subword of the accepting run A^,^ then the sequence 
simply fails. 

Otherwise check whether there is a 1 < A; < m such that the following holds: 
(a) Vk^Pk- 



Notice that pi,, is a word in the language defined by {Ui^. ■ [Pk])* ■ Ui^,, and hence it is completely 
well-defined whether a word in Uif. ■ [pk] is or not a subword of pi^ . 

This is well-defined for essentially the same reasons given in the previous footnote. 
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(b) For each pair t^j G / of the form {k, k') G [m] x [m] it is the case that 
if q'j^ (r^,r^,) is the subword in Qs,. ■ {[pk] x [Pk']) that immediately fohows 
qkiirk,rk') in the run Afc^,^ then r^ = r^ + 1, and r^, = r^/. 

(c) For each pair t^.^ G / of the form (k' , k) G [m] x [m] it is the case that 
if q'l^ (r^/,r^,) is the subword in Qs^. • {[Pk'] x [Pk]) that immediately fohows 
qkiirk',rk) in the run Afc^, then r^ = r^ + 1, and r^, = r^'. 

Intuitively, this states that we can move the fc-th head of A{R, S, I) and preserve 
the transitions on each run of the form A^^ such that Sk-^ is a copy of S that has 
one of its components reading tape k. 

If no such k exists the sequence fails. Otherwise pick the least 1 < k < m 
that satisfies the conditions above, and continue the sequence by defining the pair 
(?i+i>^i+i) as 
(liii,--- ,Ui^_^,Ui^,Ui^^^,--- ,Ui^,q[,--- ,q'e), (n,--- ,rfc-i,rfe + 1, r^+i, • • • ,r„)), 

where the following holds: 

(a) u'^ (rfc + 1) is the subword in f/j^, • [p^] that immediately follows Ui^{rk) in pi^. 

(b) For each pair t^^ G / of the form (A;, k') G [m] x [m], it is the case that q'^ satisfies 
that q'f^ (r/j + l, r^/) is the subword in Qs^. -{[pk] x \pk']) that immediately follows 
qki{rk,rk') in the run A^^. 

(c) For each pair t^^ G / of the form (k', k) G [m] x [m], it is the case that g^ satisfies 
that q'l^ {rk',rk + l) is the subword in Qs^ 'i[Pk'] x [pk]) that immediately follows 
qkiirk',rk) in the run Afc^. 

(d) For each pair ti^^ G / of the form (k' ,k") G [m] x [m] such that k' ^ k and 
k" j^ k, it is the case that g^ = (7^^. 

In this case we say that (^j-|-i,i-*j-|_i) is obtained from {qj,Pj) by performing a tran- 
sition on the k-th head. 



We first prove by induction the following crucial property of the sequence qoPoqiPi ■ ■ ■ : 
The sequence does not fail at any stage j > 0. Clearly, the sequence does not fail in stage 
given by pair {qo,Po). Assume now by induction that the sequence has not failed until 
stage J > given by pair (qj, Pj), and, further, that the sequence does not stop in stage j. 
We prove next that the sequence does not fail in stage j + 1. 

If the sequence stops in stage j + 1 it clearly does not fail. Assume then that the 
sequence does not stop in stage {j + 1). Also, assume that qj = (uji, . . . , Ui^, qi, . . . ,qi), 
where each Uj^. is a state in Mif, and each qk is a state in Sk- Further, assume that Pj = 
(ri, . . . ,rm) G ([pi] U {0}) x • • • x {[pm] U {0}). Since the sequence did not stop in stage 
j it must be the case that for every 1 < k < m the sequence Uif^{rk) is a subword of the 
accepting run pj^,, and that for every 1 < k < i such that tk = (^1,^2) G [m] x [m] the 
sequence qk {rk^ , "r/ca ) is a subword of the accepting run A^ . 

Assume that (gj+i, Pj+i) is obtained from {qj,Pj) by performing a transition on the 
k-th. head, for 1 < k < m. Then the pair (^j+i, Pj+i) is of the form: 



(tti, ,••• ,tti, ,••• ,Ui___,qi,- ■ ■ ,q£), (r^,--- ,ri^,--- ,r^)j. 



-'■ji ) ■ ■ ■ ) ^jfc 1 



where the following holds: 



Notice, since A{R,S,I) does not allow empty transitions, that 5fc^(r^,r^/) is well-defined since the 
subword qki{rk,rk') appears exactly once in the run Xk^ and, further, qki{rk,rki) is followed in Afe^ by a 
subword in Qs^ ■ ([pk] x [pk']) because Vk / Pk- 
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(1) n-^, = Ui^^,, for each k' G [m] \ {k}, 

(2) u'^ (r^ + 1) is the subword in f/j^. • [p^] that immediately follows Ui^{rj^) in pi^, 

(3) r^, = r^/, for each k' G [m] \ {/c}, 

(4) r;^ = rfc + 1, 

(5) for each pair t^^ G / of the form (/c, /c') G [m] x [m], it is the case that g^ satisfies 
that q'j. {vk + l,?"*;') is the subword in Qs^ ■ {[Pk] x [Pfe']) that immediately follows 
qki{rk,rk') in the run Afc^, 

(6) for each pair t^^ G / of the form {k' , k) G [w,] x [m], it is the case that q'^ satisfies 
that q'j^ {rk',rk + 1) is the subword in Qs,. • {[pk'] x [Pk]) that immediately follows 
qkiirk',rk) in the run Afc^, and 

(7) for each pair t^^ G / of the form {k' , k") G [m] x [m\ such that k! ^ k and fc" 7^ A:, 
it is the case that g^ = g^j . 

Then, by inductive hypothesis, it is the case that for every k! G \rn\ \ {/c} the sequence 
w^ , (^fc' ) ^^ ^ subword of the accepting run pi^ . For the same reason, for every 1 < k' < i 
such that tk' = {ki, k2) G [m] x [m], ki ^ k and ^2 / fc, it is the case that (?^/(r^, , r'f^, ) is a 
subword of the accepting run A^'. Further, simply by definition u'^ (r^) is a subword of the 
accepting run pi^,. Also, by definition, for each pair t^^ G / of the form (k' , k) G [m] x [m], 
it is the case that (7^ (r^,, r^) is a subword of the accepting run A^^, and, similarly, for each 
pair tfcj G / of the form {k,k') G [m] x [m], it is the case that g^ (^1:' '''fc') ^^ ^ subword of the 
accepting run A^j. Hence, in order to prove that the sequence does not fail in stage j + 1 
it is enough to show that there is an 1 < /i < tti such that some pair of the form {q,P), 
where q £ Q and P G ([pi] U {0}) x • • • x {[pm] U {0}), can be obtained from {ijj+i, Pj+i) by 
performing a transition on the h-th head. 

Since the sequence does not stop in stage j + I, the set Ti = {1 < h' < m \ r'^^, j^ ph' } 
must be nonempty. Let hi be the least element in Ti. Since the underlying undirected graph 
of / is acyclic, the connected component of / to which hi belongs is a tree T. Without loss 
of generality we assume that T is rooted at hi . 

We start by trying to prove that there is pair of the form {q,P), where q £ Q and 
P G {[pi] U {0}) X • • • X ([pm] U {0}), that can be obtained from (gj+i, Pj+i) by performing a 
transition on the /ii-th head. If this is the case we are done and the proof finishes. Assume 
otherwise. Then we can assume without loss of generality that there is a pair of the form 
tfc' G / of the form (/ii,/i2) G [in] x [m] such that the subword in Qsu ' i[Phi] x [^^2]) that 
immediately follows 9^' (^/^ , ''/ij ) in the run X^' is of the form 9^' (^'/jj , ^/^j '^ ^)' (That is, 
the run A^/ continues from (7^/(r^ ,r^ ) by moving its second head). The other possibility 
is that there is a pair of the form t^" G / of the form (/12, /ii) G [m] x [m] such that the 
subword in Qs^/i • {[Ph2] x [p/ij) that immediately follows Q'k"{i~'h2'''''hi) ^^ *^^ ^^^ -^fc" ^^ ^^ 
the form (?^'//(r^ + l,r^ ). But this case is completely symmetric to the previous one. 

We then continue by trying to show that there is pair of the form (q, P), where q £ Q 
and P G ([pi]'-'{0}) x • • • x ([Pm]U{0}), that can be obtained from (ijj^i, Pj^i) by performing 
a transition on the /12-th head. If this is the case then we are ready and the proof finishes. 
Assume otherwise. Then again we can assume without loss of generality that there is 
a pair of the form t^" G / of the form (/i2)^3) £ [m] x [m] such that the subword in 
QSf.// • ([p/i2] X [phs]) that immediately follows 9^// (^/jj ' ^/i- ) ^^ ^^^ ^^^ ^k" ^^ °^ ^^^ form 
1k"^^'h '"''/i + ^)- (That is, the run X^n continues from q'kii{r'i^, jT^, ) by moving its second 
head) . 
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Since T is acyclic and finite, if we iteratively continue in tliis way from /12 we will 
either have to find some h £ Ti such that there is pair of the form {q, P), where q £ Q and 
P £ {\pi] U {0}) X • • • X {\pm] U {0}), that can be obtained from (gj+i, Pj+i) by performing a 
transition on the h-th. head, or we will have to stop in some h £% that is a leaf in T. But 
clearly for this h it must be possible to show that there is pair of the form {q,P), where 
q £ Q and P £ {\pi] U {0}) x • • • x {[pm] U {0}), that can be obtained from (qj+i, Pj+i) by 
performing a transition on the h-th head. This shows that the sequence does not fail in 
stage j + 1. 

We now continue with the proof of the first part of the theorem. Since the sequence 
does not fail, and from stage j into stage j + 1 the position of at least one head moves to the 
right of its tape, the sequence must stop in some stage j > with associated pair (qj, Pj). 
Then Pj = (pi, . . . ,Pm)- Assume that qj = {ui^ , ■ ■ ■ , Ui^,qi, . . . , g^), where each Ui^ is a 
state in A/^j, and each q^ is a state in S^- Then, from the properties of the sequence, it must 
be the case that Uii_{pk) appears as a subword in the accepting run pj^,, for each 1 < k < m, 
and for each 1 < k < i such that tk = (A;i, /C2) £ [fn] x [m] it is the case that qk{Pki,Pk2) 
appears as a subword in the accepting run A^. Hence Ui^, = Ui^^p^^-i and qj^ = qk,rk- 

It easily follows from the definition of the sequence {%, PQ){qi,Pi) ■ ■ ■ and the transition 
function 6 of A{R, S, I), that the following holds for each k < j: If {qk+i, Pk+i) is obtained 
from {qk,Pk) by performing a transition on the k'-the head, 1 < k' < m, then {qk+i,Pk+i) 
is a valid transition from (g^, P^) over Hj in the A;'-th head. Further, assume that 

a = {{tTi{w))[pi], ... ,{TTn{w))[pn]), 

then S{qj, a) contains a pair of the form (^j+i, {[?7i]}), where: 

Qj+l •= (""n.PD ■ • • )'"im,pm)9l,ri+li • • • >g^,r£+i )• 

Clearly, qj+i £ F (that is, qj+i is a final state of A{R, S,I)) and we conclude that 
QoPoQiPi • • • QjPjQj+i is an accepting run of A{R, S, I) over w, which was to be proved. 

We now explain how Theorem 6.7 follows from Lemma 6.8. The lemma tells us that 
in order to solve acyclic instances of GenInt5'(REC) we can construct, from the m-ary 
recognizable relation R, the binary rational relation S and the acyclic / C [m] x [m], the 
m-tape automaton A{R, S, I), and then check A{R, S, I) for nonemptiness. The latter can 
be done in polynomial time in the size of A{R, S, I) by performing a simple reachability 
analysis in the states of A{R, S,I). This gives us a simple exponential time bound for 
the complexity of solving acyclic instances of GenInt5(REC). However, as we mentioned 
before, each state in A{R, S, I) is of polynomial size. Thus, checking whether A{R, S, I) 
is nonempty can be done in nondeterministic P Space by using a standard "on-the-fly" 
construction of A{R, S, I) as follows: Whenever the reachability algorithm for checking 
emptiness of A{R, S, I) wants to move from a state ri of A{R, S, I) to a state r2, it guesses 
r2 and checks whether there is a transition from ri to r2. Once this is done, the algorithm 
can discard ri and follow from r2. Thus, at each step, the algorithm needs to keep track of at 
most two states, each one of polynomial size. From Savitch's theorem, we know that PSpace 
equals nondeterministic PSpace. This shows that acyclic instances of GenInt5'(REC) can 
be solved in PSpace. 



The proof of the second part of the theorem is by an easy reduction from the PCP 
problem (e.g. in the style of the proof of the second part of Theorem 6.10). D 
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6.3. CRPQs with rational relations. The acyclicity condition gives us a robust class 
of queries, with an easy syntactic definition, that can be extended with arbitrary rational 
relations. Note that acyclicity is a very standard restriction imposed on database queries to 
achieve better behavior, often with respect to complexity; it is in general known to be easy 
to enforce syntactically, and to yield benefits from both the semantics and query evaluation 
point of view. This is the approach we follow here. 

Recall that CRPQ(S') queries are those of the form 

m 

ip{x) = 3y (^ /\{ui^-^ u'i) A /\ SixhXj)): 

see (4.2) in Sec. 4. We call such a query acyclic if G/, the underlying undirected graph of /, 
is acyclic. 

Theorem 6.9. The query evaluation problem for acyclic CRPQ(S') queries is decidable for 
every binary rational relation S. Its combined complexity is PSPACE-complete, and data 
complexity is NLocSPACE-complete. 

Proof. We provide a nondeterministic PS pace algorithm that solves the query evaluation 
problem when we assume the query to be part of the input (i.e. combined complexity). 
Then the result will follow from Savitch's theorem, that states that P Space equals nonde- 
terministic PSpace. 

Given a graph G, a tuple a of nodes, and acyclic CRPQ(S') query of the form 

m 

ip{x) = 3y (^ /\{ui ^-^ u'i) A /\ S{pi,pj)j, 

the algorithm starts by guessing a polynomial size assignment b for the existentially quanti- 
fied variables of ^{x), that is, the variables in y. It then checks that G |= ip{a, b), assuming 
that ijj(x,y) is the CRPQ(S') formula 

m 

(/\(u/-i^^nD A /\ S{p,,p,)). 

If this is the case the algorithm accepts and declares that G |= ip{a). Otherwise it rejects 
and declares that G ^ ^{a). 

By using essentially the same techniques as in the proof of Lemma 4.1, one can show 
that there is a polynomial time translation that, given G and ip{a, b), constructs an acyclic 
instance of GenInt5(REC) such that the answer to this instance is 'y^s' iff G |= ip{a,b). 
From Theorem 6.7 we know that acyclic instances of GenInt5(REC) can be solved in 
PSpace, and hence that the algorithm described above can be performed in nondetermin- 
istic PSpace. 

With respect to the data complexity, we start with the following observation. Acyclic 
instances of GenInt5(REC) can be solved in NLogSpace for m-ary relations in REC, if we 
assume m to be fixed. The proof of this fact mimicks the proof of the PSpace upper bound 
in Theorem 6.7, but this time we assume the arity of R to be fixed. In such case A{R, S, I) 
is of polynomial size, and each one of its states is of logarithmic size. We can easily check 
A{R, S, I) for nonemptiness in NLocSpace in this case, by performing a standard "on-the- 
fly" reachability analysis. 
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We provide an NLogSpace algorithm that solves the query evaluation problem when 
we assume the query to be fixed (i.e. data complexity). Consider a fixed acyclic CRPQ(S') 
query of the form 

m 

^{x) = 3y[l\{u^,'^u',) A l\ S{p.,p,)). 

Given a graph G and tuple a of nodes, the algorithm constructs (using the proof of Lemma 
4.1) in deterministic logarithmic space an acyclic instance of GenInt5(REC), given by 
recognizable relation R oi fixed arity m (this follows from the fact that f{x) is fixed), and 
fixed / C [m] x [m], such that the answer to this instance is 'yes' iff G |= (p{a). Since 
the arity of R is fixed, our previous observation tells us that we can solve the instance of 
GenInt5(REC) given by R and / in NLogSpace. But NLogSpace reductions compose, 
and hence the data complexity of the query evaluation problem for CRPQ(S') queries is 
also NLogSpace. D 

Thus, we get not only the possibility of extending CRPQs with rational relations but 
also a good complexity of query evaluation. The NLocSPACE-data complexity matches 
that of RPQs, CRPQs, and ECRPQs [16, 17, 4], and the combined complexity matches 
that of first-order logic, or ECRPQs without extra relations. 

The next natural question is whether we can recover decidability for weaker syntactic 
conditions by putting restrictions on a class of relations S. The answer to this is positive 
if we consider directed acyclicity of /, rather than acyclicity of the underlying undirected 
graph of I. Then we get decidability for the class of SCR relations. In fact, we have a 
dichotomy similar to that of Theorem 6.7. 

Theorem 6.10. 

7 

• Let 5 be a relation from SCR. Then (REC Pi/ S") = is decidable in NExptime if 
/ is a directed acyclic graph. 

• There is a relation / with a directed cycle and S G SCR such that (REC Pi/ S*) = 
is undecidable. 

Proof. We start by proving the first item. In order to do that, we first prove a small model 

7 

property for the size of the witnesses of the instances in (REC Hj S) = 0, when S is a relation 
in SCR and / is a DAG. Let R be an ?7i-ary recognizable relation, m > 0, and / C [m] x [m] 
that defines a DAG. Assume that both R and S are over S. Then the following holds: 
Assume Rdj S j^ 9. There is id = {wi, . . . , Wm) £ (S*)™ of at most exponential size that 
is accepted by R and such that {wi, Wj) G S, for each (i, j) G /. We prove this small model 
property by applying usual cutting techniques. 
Assume that R is given as 

\jMi^ X ••• xM™, 

i 

where each A/i is an NFA over S. Further, assume that S is given as one of the 2-tape 
NFAs used in the PSpace upper bound of Theorem 6.7. That is, S defined by the tuple 
{Qs^ S, Q^, 5s, Qs)^ where Qs is the set of states, the set of initial states is Q^, the set of 
final states is Qf , and 5^ : Qs x (S U {e}) x (S U {e}) -^ 2«^«i'2}u{{i,2}}) jg tj^e transition 
function. Assume also that there is n = {ui, . . . ,Um) G (S*)™ that is accepted by R such 
that (uj, Uj) G S, for each (i, j) G /. Then u is accepted by A/i^ x • • • x A/i„, for some i. 
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Since / is a DAG it has a topological order on [m] . We assume without loss of generality 
that such topological order is precisely the linear order on [m]. We prove the following 
invariant on 1 < ^ < m: There exists w = {wi, . . . jWm) G (E*)*" such that (1) w is 
accepted by R, (2) {wj^Wk) € 5", for each (j, A;) G /, and (3) each w^i with i' < i is of at 
most exponential size. Clearly this proves our small model property on I = m. The proof 
is by induction. 

The basis case is i = 1. We start from u and "cut" its first component in order to 
satisfy the invariant. By using standard pumping techniques it is possible to show that 
there is a subsequence wi of ui of size at most 0(|A/iJ) that is accepted by A/ij . Clearly the 
tuple {wi,U2, ■ ■ ■ ,Um) belongs to R. Further, for each pair of the form (l,j) in / it is the 
case that {wi,Uj) € S. This is the case because {ui,Uj) £ S, uiQ Wj and S £ SCR. Notice 
that we do not need to consider pairs of the form (j, 1) since we are assuming that the 
linear order on [m] is a topological order of /. This implies that {'Wi,U2, ■ ■ ■ ,Um) satisfies 
our invariant on i = 1. 

Assume now that the invariant holds for i < m. Then there exists w = (wi, . . . , Wm) G 
(S*)™ such that (1) w is accepted by R, (2) {wj, Wk) G S, for each (j, k) G /, and (3) each 
w^i with i' < a. is of at most exponential size. We proceed to "cut" W£+i while preserving 
the invariant. Let I{(. + 1) be {1 < j < ^ | (j, ^ + 1) G /}. Let pj be an accepting run 
of S over [wj^wi^j^i), for each j G !{(. + 1). Further, let V be the set of all positions 
1 < A: < |if£+i| such that for some j G I{i. + 1) the accepting run pj contains a subword 
of the form q (k' , k) q' {k' + 1, k), where q,q' G Qs and 1 < k' < \wj\. That is, V defines 
the set of positions over tt)£+i, in which the accepting run pj of S over {wj,wi^i), for some 
j G I{i + 1), makes a move on the head positioned over Wj. Intuitively, these are the 
positions of W£^i that should not be "cut" in order to maintain the invariant. Notice that 
the size of V is bounded by s := Si<£/<£|u;^/|, and hence from the inductive hypothesis the 
size of V is exponentially bounded. 

By using standard pumping techniques it is possible to show that there is a subsequence 
w'^^-^ of Wi+i of size at most |A/'j^_|_J • \V\ • \I{(^ + 1) • \Qs\ ■ 1^1 +2, such that w'f^_^_-^ is accepted 
by A/ij,_|_j and {wj,w'f^,-^^) is accepted by S, for each j G I{i+ 1). Assume this is not the case, 
and that the shortest subsequence w'^^,-^ of w^+i that satisfies this condition is of length 
strictly bigger than |A/i^_|_J • |7^| • \I{i + 1)| • \Qs\ • l^^l + 2. Then there exist two positions 
1 ^ ^ < J < l^^+i| such that (i) k ^ V, for each i < k < j, (ii) the labels of i and j in tf^+i 
coincide, (iii) the run ps assigns the same state to both i and j, for each s G I{i + 1), and 
(iv) some accepting run of AA^+i assigns the same state to both i and j. Let w'^_^_-^ be the 
subsequence of w'^_^_-^ that is obtained by cutting all positions i < k < j — 1. Clearly, w'^_^_^ is 
shorter than w'g_^_^ and is accepted by AA^+i. Further, {■Ws,w'^_^_j^) is accepted by S, for every 
s G I{i + 1). This is because iws,w'^,-^^) is invariant with respect to the accepting run ps, 
for each s G I{i + 1), as the cutting does not include elements in V (that is, we only cut 
elements in which ps does not need to synchronize with the head positioned over Wg) and 
Ps assigns the same state to both i and j, which have, in addition, the same label. This is 
a contradiction. 

We claim that w' = {wi, . . . , W£, w'^_^_^,W£^2, ' " i Wm) G (S*)"^ satisfies the invariant. 
Clearly, w' is accepted by R since if^+i is accepted by Mii,j^^ and, by inductive hypothesis, Wj 
is accepted by A/i , for each j G [m] \{l + l}. Further, simply by definition it is the case that 
{wj,w'^_^^ G S*, for eachj G I{1+1). Moreover, {w'^_^^,Wj) G S", for each (£+l,j) G /, simply 
because w'^,^ ^ wi^i and S G SCR. The remaining pairs in / are satisfied by induction 
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hypothesis. Finally, w'^_^^ is of size at most 0(|A/i^_|_J • |7^| • \I{i + 1)| • \Qs\ • I^D) and hence, 
by inductive hypothesis, it is of size at most exponential. By inductive hypothesis, each W£' 
with i' < i is of size at most exponential. 

It is now simple to prove the first part of the theorem using the small model property. 
In fact, in order to check whether i? Pi/ S* 7^ 0, for S G SCR, we only need to guess an 
exponential size witness w, and then check in polynomial time that it satisfies R and each 
projection in I satisfies S. This algorithm clearly works in nondeterministic exponential 
time. 

Now we prove the second item. We reduce from the PCP problem. Assume that 
the input to PCP are two equally long lists oi, 02, . . . , a„ and 61, 62, • • • , ^n of strings over 
alphabet S. Recall that we want to decide whether there exists a solution for this input, that 
is, a sequence of indices ii,i2, ■ ■ ■ ,ik such that I < ij < n {1 < j < k) and ai-^ai^ • • • Oij. = 

Assume without loss of generality that S is disjoint from N. Corresponding to every 
input ai, 02, . . . , a„ and 61, 62, • • • , ^n of PCP over alphabet S, we define the following: 

• An alphabet S(n) := S U {1, 2, . . . , n}; 

• a regular language Ra,n ■= (Ui<j<„ «» ' 0*; 

• a regular language Rb^n ■= (Ui<j<n^i 'J)*- 

Consider a ternary recognizable relation R over alphabet S(n) U {•, f}, where • and f 
are symbols not appearing in S(n), defined as 

(*-S*) X {\-Ra,n) X {\-Rl,,n)- 

Further, consider a binary relation S over (S(n) U {•, f})* defined as the union of the 
following sets: 

(1) {{w,w') G (t • (S(n))*) X (t • (S(n))*) | W{,_^y Q w[, „j}. 

(2) {{w, w') G (t • (S(n))*) X (* • S*) I u;s E w'^}. 

(3) {{w, w') G (* . S*) X (t • (S(n))*) | wj: Q w'^}. 

The intuition is that S takes care that indices in the sequences are consistent. It is easy to 
see that S" is a rational relation, which implies that S'c is in SCR. 

From input oi, . . . , a„ and 61, . . . , 6„ to the PCP problem, we construct an instance of 
GenInt5^(REC) defined by the recognizable relation R and 

/ = {(1,2), (2,1), (1,3), (3,1), (2,3), (3,2)}. 

We claim that i? Pi/ S" / if and only if the PCP instance given by lists ai, . . . , a^ and 
bi, . . . ,bn has a solution. 

Assume first that i? Pi/ S" 7^ 0. Hence there are words wi G (* • S*), ttJ2 G (f • Ra,n) 
and W3 £ {\ • Rb,n), such that {wi,Wj) belongs to S'c, for each (i,j) G /. Since (2,3) G /, 
it must be the case that {'W2,ws) belongs to Sq. Thus, since the first symbol of both W2 
and ws is f, it must be the case that (w2){i,...,n} E ('W^3){i,...,n}- For the same reasons, and 
given that (3,2) G /, it must be the case that (ii'3)|i,...,n} != (''^2){i,...,n}- We conclude that 

(W^2){l,...,n} = iw3){l,...,n}- 

Since (1,2) G /, it must be the case that (tfi,?i72) belongs to S^. Thus, since the first 
symbol of wi is • and the first symbol of ^2 is t) it must be the case that (tt)i)s E ("^2)2- 
For the same reasons, and given that (2, 1) G /, it must be the case that {w2)j] Q iwi)j:. 
We conclude that (wi)s = (^i'2)E- 
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Mimicking the same argument, but this time using the fact that {(1, 3), (3, 1)} C /, we 
conclude that (1^1)2 = ('U^3)s- But then {w2)t, = (^3)2 (because (wi)s = (w2)s)- 

Assume (ti'2){i,...,n} = (^3){i,...,n} = h'i'2 • • • in, where each ij £ [n]. Then from the 
fact that (tf2)s = (^^'3)2 we conclude that ai-^^Ui^ ■ ■ ■ ai^ = hi^h-i^ •••&«„, and hence that the 
instance of the PCP problem given by ai, . . . , On and 61, . . . , 6„ has a solution. 

The other direction, that is, that the fact that the instance of the PCP problem given 
by ai, . . . , a^ and 61, . . . , 6„ has a solution implies that iifl/ S* / 0, can be proved using the 
same arguments. D 



In particular, if we have a CRPQ(S') query of the form 

m 

where / is acyclic (as a directed graph) and S £ SCR, then query evaluation has NEXPTIME 
combined complexity. 

The proof of this result is quite different from the upper bound proof of Theorem 6.7, 
since the set of witnesses for the generalized intersection problem is no longer guaranteed 
to be rational without the undirected acyclicity condition. Instead, here we establish the 
finite-model property, which implies the result. 

Also, as a corollary to the proof of Theorem 6.10, we get the following result: 
Proposition 6.11. Let S G SCR be a partial order. Then GenInt5'(REC) is decidable in 

NEXPTIME. 

Proof. As in the previous proof, we start by proving a small model property for the size of 
the witnesses of the instances in GenInt5'(REC), for S a partial order in SCR. Let R be an 
m-ary recognizable relation, m > 0, and / C [m] x [m] . Assume that both R and S are over 
S. Then the following holds: Assume i?n/ 5 / 0. There is w = {wi, . . . , Wm) G (S*)™ of at 
most exponential size that is accepted by R and such that {wi,Wj) £ S, for each {i,j) G /. 
We prove this small model property by applying usual cutting techniques. 
Assume that R is given as 

[JAfi, x-'-xATi^, 

i 

where each A/i^ is an NFA over S. Further, assume that S is given as the 2-tape transducer S 
defined by the tuple {Qs, ^, Q% ^s, Qs)^ where Qs is the set of states, the set of initial states 
is Q% the set of final states is Qf , and 5s:QsX (SU{e}) x (SU{e}) -^ 2Q^«i'2}u{{i,2}}) jg 
the transition function. Assume also that there isu = (ui, . . . , Um) £ (S*)™' that is accepted 
by R and such that {ui,Uj) £ S, for each (i, j) £ I- Then u is accepted by A/ij x • • • x A/i„, 
for some i. 

Let /''" be the transitive closure of /. Notice, since S defines a partial order over S*, 
that {uj,Uk) £ S, for each (j, fc) £ I'^ . Further, for every pair {j,k) £ [m] x [m] such that 
{(j, /c), (A;, j)} C /+ we must have that Uj = u^- We need to maintain such equality when 
applying our cutting techniques over u. In order to do that we define an equivalence relation 
£1 over [m] as follows: 

Si ■= {(j, k) £ [m] x[m]\j = k or {{j, k), {k,j)} C /+}. 
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Hence £i contains all pairs (j, A:) G [m] x [m\ such that / implies Uj = Uk- Take the quotient 
[ml/Sj, and consider the restriction I{[m]/£j) of / over [m]/£j, defined in the expected way: 
(bif/! [k]£i) £ ^(M/"^/) if and only if (/, k') G /, for some / G [jjs^ and k' G [kjsj. Notice 
that I{[m]/£j) defines a DAG over [m]/£j. 

Consider now a new input to GenInt5(REC), given this time by I{[m]/£j) C ([m]/£i) x 
{[m]/£j), and the recognizable relation R' defined as 

Me J 



n 



Ml , 
[i]£,eH/£-^ 

where ^A^ ' = flfcefilp ^ik- Notice that this new input may be of exponential size in the 
size of R. 

Assume that [rn\/£i consists of p < m equivalence classes and, without loss of generality, 
that these correspond to the first p indices of [rri\ . Hence each product in R' is of the form 
nA^n X • • • X Aiip, where Aiij is defined as the intersection of all NFAs in the equivalence 
class [jjsj- Also, I{[m]/£i) is the restriction of / to [p] x [p]. Then it must be the case that 
{ui, . . . ,Up) G {T,*y belongs to R' and {uj,Uk) G S, for each {j,k) G I{[7n]/£j). Further, 
from every witness to the fact that R'<^i([m]/£i) 5" / we can construct in polynomial time a 
witness to the fact that RrijS ^ 0. Hence, in order to prove our small model property it will 
be enough to prove the following: There is w = {wi, . . . , Wp) G (S*)^ of at most exponential 
size (in R) that is accepted by R' and such that {wj, w^) G S", for each (j, h) G I{[m]/£i). 

The latter can be done by mimicking the inductive proof of the first part of Theorem 
6.10. We only have to deal now with the issue that some of the NFAs that define R' may be 
exponential in the size of R. However, by following the inductive proof one observes that 
this is not a problem, and that the same exponential bound holds in this case. 

It is now simple to prove the first part of the theorem using the small model property. 
In fact, in order to check whether RDj S ^ 9, for S a partial order in SCR, we only need 
to guess an exponential size witness w, and then check in exponential time that it satisfies 
R and each projection in / satisfies S. This algorithm clearly works in nondeterministic 
exponential time. D 

By applying similar techniques to those in the proof of Theorem 6.9 we obtain the 
following. 

Corollary 6.12. If S* G SCR is a partial order, then CRPQ(S') queries can be evaluated 
with NEXPTIME combined complexity. In particular, CRPQ(C) queries have NExptime 
combined complexity. 

We do not have at this point a matching lower bound for the complexity CRPQ(!^) 
queries. Notice that an easy PSpace lower bound follows by a reduction from the intersec- 
tion problem for NFAs, as the one presented in the proof of Theorem 6.7. 

The last question is whether these results can be extended to other relations considered 
here, such as subword and suffix. We do not know the result for subword (which appears 
to be hard), but we do have a matching complexity bound for the suffix relation. 

Proposition 6.13. The problem GENlNT^^^g(REC) is decidable in NExptime. In partic- 
ular, CRPQ(^suff) queries can be evaluated with NExptime combined complexity. 
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Proof. We only prove that GenInt^^^j,.(REC) is decidable in NExptime. The fact that 
CRPQ(^suff) queries can be evaluated with NExptime combined complexity follows easily 
from this by applying the same techniques as in the proof of Theorem 6.9. 

We start by proving a small model property for the size of the witnesses of the instances 
in GenInt-<^^jj(REC). Let R be an m-aiy recognizable relation, m > 0, and / C [m] x [ttt,]. 
Assume that both R and ^suflf are over S. Then the following holds: Assume it is the case 
that Rrij {^suflf} / 0- There is w = {wi, . . . ,Wm) G (S*)™ of at most exponential size that 
is accepted by R and such that Wi ^suflf Wj, for each {i,j) G /. We prove this small model 
property by applying cutting techniques. 

Assume that R is given as 

\jAfi, x---xAfi^, 

i 

where each Mij is an NFA over S. We assume, without loss of generality, that / defines 
a DAG over [m] x [m]. In fact, assume otherwise; that is, I does not define a DAG over 
[m] X [m]. Since ^suff defines a partial order over S*, we can always reduce in polynomial 
time the instance of GENlNT-<^^g(REC) given by R and / to an "equivalent" instance of 
GenInt-<^^jj(REC) given by recognizable relation R' of arity m! < m and I' C [m!] x [m!] 
such that /' defines a DAG. We already showed how to do this for an arbitrary partial order 
over E* in the proof of Proposition 6.11, so we prefer not to repeat the argument here, 
and simply assume that / defines a DAG over [m] x [m]. Since / defines a DAG it has a 
topological order over [ttt,] . We assume without loss of generality that such topological order 
is precisely the linear order on [m]. 

Assume then that there is n = {ui, . . . ,Um) G (S*)™ that is accepted by R and such 
that Ui ^suff Uj, for each {i,j) G /. Then u is accepted by A/i^ x • • • x Mi^, for some i. 
Assume that the length of Uj is pj > 0, for each 1 < j < m. Our goal is to "cut" u in order 
to obtain an exponential size witness to the fact that RCij {^suff} 7^ 0- 

We recursively define the set A^^ of marked positions in string u/., 1 < k < m, as 
follows: 

• No position in ui is marked. 

• For each 1 < k < m the set Mk of marked positions in Uk is defined as the union 
of the marked positions in Uk with respect to j, for each j < k such that (j, k) G /, 
where the latter is defined as follows. Assume that Aij is the set of marked positions 
in Uj. Then the set Mk of positions 1 < ^ < Pfc that are marked in u/. with respect 
to j is {r + pk — Pj I r = 1 or r G Mj}- (Notice that p^ — Pj > since Uj ^suflf u^, 
and hence 1 < r + p/. — pj < pi^ for each r G Mj and for r = 1). 

Intuitively, Aik consists of those positions 1 < I < Pk such that for some j < k with 
(j, k) G /^, where I^ is the transitive closure of /, it is the case that that Uk = nfc[l, ^— 1] -Uj. 
Or, in other words, the fact that Uj ^suflf Uk starts to be "witnessed" at position i oiuk- We 
assume the A^fc's to be linearly ordered by the restriction of the linear order 1 < 2 < ■ ■ ■ < m 
to TWfc. By a simple inductive argument it is possible to prove that the size of TW^ is 
polynomially bounded in m, for each 1 < k < m. 

Since Uj ^suff u^, for each {j,k) G /, this implies that the labels in some positions of 
Uj are preserved in the respective positions of u^ that witness the fact that Uj ^suflf ^fc- 
The important thing to notice is that, since we are dealing with ^suff) the following holds: 
For each position p that is "copied" from Uj into Uk in order to satisfy Uj ^suflf u/., the 
distance from p to the last element of Uj equals the distance from the copy of p in u^ to 
the last position of u^. That is, distances to the last element of the string are preserved 



GRAPH LOGICS WITH RATIONAL RELATIONS 41 



when copying positions (and labels) in order to satisfy I. We need to take care of this 
information when "cutting" u in order to obtain an exponential size witness for the fact 
that Rr\i {^suff} 7^ 0- In order to do this we define for each < r < max{pfc \ 1 < k < m}, 
a binary relation ^ on {ui, . . . , Um} such that Uj ^ u^ if pj — r > and (j, A;) G /. This 
implies that position pj — r of Uj is "copied" as position pk — r oi Uk in order to satisfy the 
fact that Uj ^suff "Ufc- 

But in order to consistently "cut" u, we need to preserve the suffix relation both with 
respect to forward and backward edges of the graph defined by /. In order to do that we 

define f^ as (^ U(^)^^). Further, since ^guff is a partial order over S*, and hence it 
defines a transitive relation, it is important for us also to consider the transitive closure 
(f^)+ of the binary relation f^. Intuitively, Uj{^)^Uk, for 1 < j,k < m, if position pj — r 
of Uj has to be "copied" into position p^ — r oi u^ in order for u to satisfy the pairs in / 
with respect to ^suff- 

Let t := lAAjJ • jTV^J • • • |7Vi„| and s := (Ei<fc<m l-^fcl) + 1- We claim the following: 
There is w = {wi, . . . ,Wm) £ (S*)"* such that: (1) w is accepted by R, (2) Wi ^suff Wj, for 
each {i,j) G /, and (3) for each 1 < k < m the number of positions in Wk between any two 
consecutive positions in A^^ is bounded by s • t • 2™ • |S|™. This clearly implies our small 
model property. 

Assume that u does not satisfy this. Then there exists 1 < j < m and two consecutive 
positions p and p' in Aij, such that the number of positions in Uj between p and p' is bigger 
than s -t- 2"^ ■ \T,\"^. But this implies that there are two positions pj — r and pj —r'{r> r') 
between p and p' in Uj such that the following hold: 

f f 

(1) {1 < k < m \ Uj{^^)^Uk} = {1 < k < ra \ Uj{^)^Uk}- Intuitively, this says that 
the set of strings in which position pj — r of Uj is "copied" coincides with the set of 
strings in which position pj — r' of Uj is "copied" . 

(2) For each k such that Uj(^)'^Uk it is the case that neither pk — r nor pk — r' is a 
marked position in Aik, and there is no marked position in Ai^ in between pt — t 
and Pk — r' in Uk- 

(3) The state assigned by the accepting run of A/i^ over Uj to position pj — r of Uj is 
the same than the one assigned to position pj — r' . 

(4) The state assigned by the accepting run of Mi^ over Uk to the "copy" pk — r oi 

T 

position Pj — r over Uk, for each k such that Uj{^)~^Uk, is the same than the one 
assigned to the "copy" pk — r' of position pj — r' over Uk ■ 

(5) The symbol in position pj — r of Uj is the same as the symbol in position pj — r' of 

Uj. 

T 

(6) For each k such that Uj{-^)^Uk it is the case that the symbol in position pk — r oi 
Uk is the same as the symbol in position pk — r' of Uk- 

Intuitively, this states that if we "cut" the string Uj from position pj — r + 1 to pj — r', 

and string Uk from position pk — r +1 to pk — r' , for each k such that Uj{^)'^Uk, then the 
resulting u' = {u'l, . . . ,u'^) G (S)™ satisfies the following: (1) u! is accepted by i?, and (2) 
for each (j, fc) G I it is the case that u'j ^suff u'j^- We formally prove this below. Notice for 
the time being that this implies our small model property. Indeed, if we recursively apply 
this procedure to u we will end up with w = (tfi, . . . .,Wm) G (S*)™" such that: (1) w is 
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accepted by i?, (2) Wj ^suff Wk, for each (j, k) G I, and (3) for each 1 < k < m the number 
of positfons in Wk between any two consecutive positions in Aik is bounded by s-t-2'^ ■ ISI™". 
Let v! = {u[, . . . ,u'^) G (E)™ be the result of applying once the cutting procedure 
described above to -u = {ui, . . . ,Um), starting from string Uj by cutting positions from 
Pj — r + 1 to pj — r' (r > r'). It is not hard to see that v! is accepted by R, since each 
Uk has been cut in a way that is invariant with respect to the accepting run of Ni^. over 
Uk- Assume that (^, fc) G /. We need to prove that u^ ^suflf u'j^. If u^ = u'^ and Uk = u'^ 
then u'^ ^suff u'j^ by assumption. Assume then that at least one of u^ and Uk has been cut. 
Suppose first that ui has been cut from position pi — r + 1 to position pi — r' in order to 

obtain u'^. Then Uj{^^)^ui and nj(^)^U£. Clearly, it is also the case that ui f^ Uk and 

ue ^ Uk, which implies that Uj{^)'^Uk and Uj{^)^Uk- Thus, Uk is also cut from position 
Pk — r + 1 to pk — r' in order to obtain n'^, and hence u^ ^suff u'j^- Suppose, on the other 
hand, that ui has not been cut but Uk has been cut from position pk — r + 1 to position 
Pk — r' in order to obtain n^. We consider three cases: 

(1) r' > Pj - 1. Then clearly u'^ ^suff u'j. 

(2) r' < Pj — 1 and r > pj — 1. This cannot be the case since then either pk — r' is a 
marked position in Aik (when r' = pj — 1), or pk — r and pk — r' have a marked 
position in Mk in between (namely, pk — Pj + !)• Any of these contradicts the fact 
that a cutting of Uk could be applied from position pk — r to position pk — r' in order 
to obtain n'^. 

(3) r' < Pj — 1 and r > pj — 1. Similar to the previous one. 

(4) r < Pj — 1. But then clearly U£ ^ Uk and Ui ^ Uk, which implies that Uj(^^)^ui 

r' 

and Uj{^)'^ui. This implies that ue should have also been cut from position pi — r 
to position p£ — r' in order to obtain u'^, which is a contradiction. 

We can finally prove the theorem using the small model property. In fact, in order to 
check whether Rdj {^suff} / we only need to guess an exponential size witness w, and 
then check in polynomial time that it satisfies R and each projection in / satisfies ^suff- 
This algorithm clearly works in nondeterministic exponential time. D 



7. Conclusions 

Motivated by problems arising in studying logics on graphs (as well as some verification 
problems), we studied the intersection problem for rational relations with recognizable and 
regular relations over words. We have looked at rational relations such as subword ^, 
suffix ^suflf) and subsequence 1^, which are often needed in graph querying tasks. The main 
results on the complexity of the intersection and generalized intersection problems, as well 
as the combined complexity of evaluating different classes of logical queries over graphs are 
summarized in Fig. 2. Several results generalizing those (e.g., to the class of SCR relations) 
were also shown. Two problems related to the interaction of the subword relation with 
recognizable relations remain open and appear to be hard. 

From the practical point of view, as rational-relation comparisons are demanded by 
many applications of graph data, our results essentially say that such comparisons should 
not be used together with regular-relation comparisons, and that they need to form acyclic 
patterns (easily enforced syntactically) for efficient evaluation. 
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Figure 2: Complexity of the intersection and generalized intersection problems, and com- 
bined complexity of graph queries for subword (^), suffix (^suff); and subsequence 
(C) relations. NMR stands for non-multiply-recursive lower bound. 

So far we dealt with the classical setting of graph data [1, 9, 10, 16, 17] in which the 
model of data is that of a graph with labels from a finite alphabet. In both graph data 
and verification problems it is often necessary to deal with the extended case of infinite 
alphabets (say, with graphs holding data values describing its nodes), and languages that 
query both topology and data have been proposed recently [24, 27]. A natural question is 
to extend the positive results shown here to such a setting. 
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